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CONFIDENCE BANDS IN DENSITY ESTIMATION 

By Evarist Gine and Richard Nickl 
University of Connecticut and University of Cambridge 

Given a sample from some unknown continuous density / : R — > R, 
we construct adaptive confidence bands that are honest for all densi- 
ties in a "generic" subset of the union of t-H61der balls, < t < r, 
where r is a fixed but arbitrary integer. The exceptional ("non- 
generic") set of densities for which our results do not hold is shown 
to be nowhere dense in the relevant Holder-norm topologies. In the 
course of the proofs we also obtain limit theorems for maxima of lin- 
ear wavelet and kernel density estimators, which are of independent 
interest. 

1. Introduction. Let X\, . . . ,X n be i.i.d. random variables with uni- 
formly continuous density /:R— >M, and let f n be some density estimator 
for /. A natural loss function to assess the statistical performance of f n 
is sup-norm loss doo(f n ,f) = sup x \ f n {x) — f(x)\: it gives a clear geometric 
interpretation of the estimation error, suggesting heuristically the existence 
of a "band" around /„ that shrinks at rate G?oo(/n>/) an d contains / with 
probability close to one. 

Classical methods to construct confidence bands in density estimation — 
for example, the ones based on extreme value theory in Smirnov (1950) 
for histogram estimators and in Bickel and Rosenblatt (1973) for kernel 
estimators — require / to satisfy stringent differentiability assumptions, and, 
more importantly, are based on a priori knowledge of the degree of smooth- 
ness of /. Recent developments in adaptive function estimation show that 
one can find purely data driven estimators f n such that d ao (f n ,f) achieves 
the minimax-optimal rate of convergence r n (t) = (n/logra) _t// ( 2 ' +1 ) for es- 
timating a density / in a given i-H61der ball [see Gine and Nickl (2009a, 
2009b, 2010) in the i.i.d. density model on M and Goldenshluger and Lepski 
(2009) in the Gaussian white noise model]. The question then arises as to 
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how one can take advantage of these adaptive rate of convergence results for 
statistical inference, in particular for the construction of "adaptive" confi- 
dence bands. 

Let us phrase our problem more precisely: The results in Gine and Nickl 
(2009a, 2009b, 2010) show that the natural class of densities over which the 
minimax-optimal rate of convergence r n (t) can be achieved in c^-loss in an 
adaptive way is 

pall . = paJi^ £) = |j.R^Ri sa probability 

density contained in £(i,L)>, 

0<t<r ' 

where £(i, L) is a ball of radius L in the usual Holder space on M and where 
the integer r measures the degree of "regularity" of the kernel or wavelet ba- 
sis used. Given a > and a family of densities V C T 33,11 , an honest confidence 
band over the interval [a, b] is a family of random intervals C n (y) := C n (y, a), 
y £ [a, b], such that the asymptotic coverage inequality 

(1.1) liminf inf Pr/(/(y) G C n (y) for all y G [a, b]) > 1 - a 

holds, and, following Cai and Low (2004), we shall say that C n (y) is adaptive 
if for every t, e > there exists L' finite such that [1(1) denoting the length 
of the interval /] 

(1.2) sup Pr/ (sup£(C n (y)) > L'r n (tj) < e, 
fe~S(t,L)nv v y J 

where r n (t) equals r n (t), possibly inflated by a multiplicative logarithmic 
penalty. 

It follows, on the one hand, from results in Low (1997) that confidence 
bands that are simultaneously adaptive and honest do not exist for V = "P al1 . 
On the other hand we shall show that honest and adaptive confidence bands 
do exist over "generic" subsets V of V 3 ^ . The subset V of V 3 ^ for which 
our results hold is "generic" in the following sense: 

(A) V contains all "smooth" densities in V , that is, all densities inV aU 
that are r-times differentiable on K; 

(B) The minimax rate of convergence over'E,(t,L)nT > is the same as the 
one over E(t, L) n V al1 ; 

(C) The class of densities excised from "P al1 is "negligible" in the sense 
that the set V 3 " 11 \ V is "topologically small. " 

Roughly speaking "topologically small" will mean that, given any t > 0, 
V can be chosen so large that the exceptional set P al1 \ V contains no (given) 
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ball of £(t, L) nV an . If one relaxes the uniformity (or "honesty" ) requirement 
in (1.1) for the sake of illustration, our results will imply that V can be 
chosen so large that V al1 \ V is nowhere dense in the (relative) Holder-norm 
topology in E(i, L) nP a " (again for every t). It should furthermore be noted 
that, although we state (B) separately, it typically follows from (C). 

We will construct fully-data-driven adaptive (nonlinear) estimators f n 
based on either wavelets or convolution kernels, and prove, uniformly over 
such a "generic" set V, a "Smirnov-Bickel-Rosenblatt"-type limit theorem; 
for any bounded interval [a, b] , 



(1.3) A n [ sup 

\y£[a,b] 



fn(y)-f(y) 



f n (y) 



B n \Az 



as n — > oo where Z is a Gumbel random variable and where the random 
(but known) constants a n , A n , B n have the right stochastic order to obtain 
a confidence band from (1.3) that shrinks, up to a logarithmic penalty, at the 
minimax rate r n {t) of estimation. The estimator we propose is of "Lepski"- 
type and not difficult to implement. It is in principle possible to replace the 
interval [a, b] by ffi, by using suitable weight functions and techniques from 
Gine, Koltchinskii and Sakhanenko (2004), but this comes at the expense 
of much more technical proofs, so we abstain from it. See Section 3 for the 
exact statements of our results. 

There has been substantial and deep recent work about the connection 
between confidence sets and rates of convergence of adaptive estimators. As 
mentioned above, Low (1997) shows some limitations for pointwise confi- 
dence intervals in density estimation. Our "generic" conditions circumvent 
his "pathologies." The paper closest to the present one is Picard and Tri- 
bouley (2000) where pointwise adaptive confidence intervals are constructed 
in regression and Gaussian white noise. Our proof strategy is partially in- 
spired by theirs, and our Condition 3 is somewhat similar to their condi- 
tion H s (M,xo) which, however, is less "generic" in the pointwise setup. Cai 
and Low (2004) develop a general theory for pointwise confidence intervals, 
which can be conceptually (but not directly) related to the sup-norm case. 
Genovese and Wasserman (2008) revisit the negative results by Low (1997) 
in the framework of regression. They suggest that valid confidence sets are 
possible if the usual notion of "coverage" is replaced by "surrogate cover- 
age," but it is not clear yet how "generic" this restriction is. There is also a 
remarkable literature on confidence sets in L 2 -loss where the theory is some- 
what different to the sup-norm/pointwise case, although the general message 
that "adaptive rates of convergence" do not simply translate into "adaptive 
confidence sets" is unchanged. We refer to Li (1989), Beran and Diimbgen 
(1998), Hoffman and Lepski (2002), Juditsky and Lambert-Lacroix (2003), 
Baraud (2004), Genovese and Wasserman (2005), Cai and Low (2006) and 
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Robins and van der Vaart (2006). Another interesting approach is based on 
imposing qualitative shape constraints on the function to be estimated. Here 
some positive results are possible; we refer to Hengartner and Stark (1995), 
Diimbgen (2003) and Davies, Kovac and Meise (2009). 

Most of the above literature is set in the Gaussian white noise model, but 
we prefer to derive our results in the i.i.d. density model, mostly for two rea- 
sons: First the asymptotic equivalence of white noise to density estimation 
only holds under quite restrictive assumptions on the underlying density; in 
particular we are interested in the low regularity case t < 1/2 as well. Second 
the problem of estimating a continuous density on M carries some specific 
structure that should be taken into account; such a density cannot be con- 
stant everywhere; neither can differentiable densities on ]R have derivatives 
that are everywhere zero, facts that play a role in the verification of some 
of our conditions. 

The limit (1.3) is based on conditions that require certain centered linear 
wavelet or kernel estimators to satisfy a "Smirnov-Bickel-Rosenblatt"-type 
limit theorem, uniformly in the underlying density /. While these results can 
be obtained, as we show, for convolution kernel estimators along the lines 
of Bickel and Rosenblatt (1973), using refinements from Gine, Koltchinskii 
and Sakhanenko (2004), results of this type do not exist at the moment 
for wavelet density estimators. It turns out that a reduction to Gaussian 
processes similar to the one for kernel estimators first proposed by Bickel 
and Rosenblatt (1973) can be proved for wavelets as well (see Proposition 
5), but the resulting Gaussian process, which equals 



where K is the wavelet projection kernel and W is Brownian motion, turns 
out to be nonstationary, so that the classical extreme value theory for 
stationary Gaussian processes [Leadbetter, Lindgren and Rootzen (1983)] 
does not apply here. However, these Gaussian processes are cyclo stationary 
[meaning that the covariance function r(t,t + v) is periodic in t with the same 
period for all v], and the extreme value theory for these and related pro- 
cesses has recently attracted some interest in the literature [see Konstant and 
Piterbarg (1993), Piterbarg and Seleznjev (1994), Hiisler (1999) and Hiisler, 
Piterbarg and Seleznjev (2003)]. Using these techniques and wavelet the- 
ory we can establish, as a first step, limit theorems for suprema of centered 
wavelet density estimators based on Battle-Lemarie wavelets [which can be 
computed numerically as spline projections, cf. Gine and Nickl (2010)]. We 
believe that this proof strategy should also work for other wavelet bases, but 
we currently do not have enough knowledge about the analytical properties 
of the covariance functions of the processes Y(t) in the case of, for instance, 
Daubechies wavelets, to succeed in doing so. This remains an open problem. 
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We finally remark that the results in this article are clearly of an asymp- 
totic (and hence "theoretical" nature); they show that adaptive confidence 
bands are possible for large sample sizes and in a certain "generic" sense, 
but we do not advocate the use of our bands in practice without a thorough 
investigation of their finite sample properties. 

2. Basic notation and definitions. 

2.1. Wavelets and function spaces. For a function H:M— >M we shall 
denote by ||-ff||jif the quantity sup m£M \H(m)\, but we shall write ||-ff||oo := 
||r. Denote further by C(R) the space of bounded continuous functions 
on R normed by || • ||oo. 

We next define the function spaces that will be at the heart of our sta- 
tistical problem. It is convenient to define them in terms of wavelets; more 
classical equivalent definitions can be found in the literature (see Remark 
1 below). Throughout this paper we shall use the by now standard wavelet 
theory, we refer, for example, to the monograph Hardle et al. (1998) in what 
follows — for an excellent treatment of the statistically most relevant mate- 
rials. In particular we shall say the scaling function of a multiresolution 
analysis is s-regular if is s-times weakly differentiable, and, for < a < s, 
D a (f) satisfies \D a <j)(x)\ < ciA C2 ' x ' for (almost) every x G R, some ci,C2 > 
and some < A < 1 . 

Definition 1. Let < t < s, 1 G R, s£N. Let be a scaling function 
that is s-regular; let ip be the associated mother wavelet and denote by 
a k(f) an d Afc(/)j & G Z, I G N, the associated wavelet coefficients of the 
function /:R— )-R. The Holder-Zygmund space C*(R) is defined as the set 
of functions 

C(R) := {/ G C(R) : \\f\\ t>O0 := sup \a k {f)\ 

+ supsup|2'('+ 1 / 2 ) / 3 /fc (/)|<oo). 
i>o kez J 

We should note that this definition is independent of the wavelet basis 
used: any wavelet basis of regularity s > t generates the same space. 

Remark 1. It is a standard result in wavelet theory [e.g., Chapter 
6.4 in Meyer (1992)] that C*(R) is equal, with equivalent norms, to the 
classical Holder-Zygmund spaces, defined as follows: For < t < 1 define 
C"(R) to be the space of functions / G C(R) for which \\f\\' tt00 := \\f\\oo + 
sup x ^ y x yG ^(\f(x) — f(y)\/\x — y\ l ) is finite. For noninteger t > 1 the space 
C*(R) is defined by requiring D^'f of / G C(R) to exist and to be contained 
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in C* W(R). The Zygmund class C 1 (M) is defined by requiring \f(x + y) + 
f(x -y)- 2/0)| < C\y\ for all x,y G R, some < C < oo and / G C(R), 
and the case m < f < m + 1 follows by requiring the same condition on the 
mth derivative of /. It is then also clear that C m (R) contains the spaces of 
m-times continuously differentiable functions with m bounded derivatives. 
We remark finally that C (R) is a special case of the scale of Besov spaces, 
namely ^(R). 

2.2. Density estimation using convolution kernels or wavelets. Let X±, . . . , 
X n be i.i.d. random variables with common law P and density / on R, and 
split the sample into two parts, S± and S2, each of size n\ and 712, respec- 
tively, in such a way that n\jni is bounded away from zero and infinity as 
n — > 00 . Denote by 

i=l i=l 

the empirical measures associated with the first and the second subsample. 
We take the {A^j's to be coordinate projections of the infinite product 
probability space R N with its product sigma-algebra and denote by Prj the 
product probability measure on this space. 

We will consider two types of preliminary linear estimators: Define first the 
classical kernel density estimator based on the sample S v , v = 1,2, namely 

where K : R — > R is a kernel and h > is some bandwidth. An alternative 
estimator is based on a wavelet projection: If cf) is a scaling function (father 
wavelet) and ip the associated (mother) wavelet, then the linear wavelet 
estimator based on the sample S v , v = 1, 2 is 

V [ K(y y ,2i x )dP nv (x) 

JR 

= & k{v)(t>{y ~ k ) + ^2Yl Pik(v)*Pik(y), y e K, j g n, 

fc 1=0 k 

where K(y, x) = ^ ^(y — k)4>(x — k) is the wavelet projection kernel, ipik(x) = 
2^/ 2 -0(2 J : x — fe) and where the empirical wavelet coefficients are 

&k(v)= / 0(x - k)dP nv (x), Pi k (v)= / ipi k {x)dP nv {x). 
JR JR 

To unify the notation for both estimators we convert the bandwidth h into 
2 _J so that the kernel-type density estimator is given by 

(2.1) fn v (y,j) = V I K(y y ,2ix)dP nv (x), 

JR 
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where K(y,x) is either the wavelet projection kernel or the convolution 
kernel K(y — x). In this way the estimator is defined also for noninteger 
j. We will use the convention Kj(y,x) = 2 J K(2 J y,2 J x), and we denote the 
expectation of f nv (y,j) by 

(2.2) Ef nv (y,j)= [ K j (y,x)f(x)dx = K J (f)(y). 

JR 

We shall make the following standard assumption on the kernel K which 
will have to be strengthened for some results. 



Condition 1. Let r € N, r > 1. Suppose one of the following conditions 
is satisfied: 

(a) (convolution kernel) let K(x,y) = K(x — y) where K : K — > R is sym- 
metric, integrable, of bounded variation and integrates to one. Assume fur- 
thermore that J K K{u)u l = for I = 1, . . . , r — 1 (vacuous if r = 1) as well as 
f R \K(u)\\u\ r du < oo; 

(b) (wavelet kernel) (i) let K(x,y) = ^2 k <fi(x — k)<j){y — k) where <p is 
a scaling function that is of bounded variation, compactly supported and 
either (p is (r — l)-regular or tp satisfies f^if)(u)u l = for every < I < r — 1; 

(c) (wavelet kernel) (ii) let K(x, y) = (p r (x — k)(j) r (y — k) where 4> r is 
the Battle-Lemarie scaling function (defined in Section 4.3.1). 



3. Adaptive confidence bands. 



Vmin,n 



3.1. Estimate of the resolution level. We use the sample S2 to choose the 
resolution level j. For integers r > 1, i%2 > 1, choose integers j m \ n := j r 

and Jmax • — jmax,nj < Jmin *~ Jmax; Surfl that 

\ l/(2r+l) 



log n 2 y V( lo S n 2) 

iT^ • — i7n — [iminiimax] H 

We note in advance that j m i n is the resolution level we would choose if we 
knew that the unknown density / is r-times continuously differ entiable, and 
we are not trying to adapt to densities smoother than this. (This does not 
mean that we rule out densities that are very smooth; it just means that we 
live with a "nonadaptive" rate of convergence in these cases.) On the other 
hand, j max is the resolution level that just produces uniform consistency of 
the linear estimator / n2 (jmax) if / is bounded and uniformly continuous. So 
our problem is to adapt to the unknown smoothness t of / where t varies 
between and r. 
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Our data-driven choice j n for the resolution level is of "Lepski-type" and 
is based on the subsample S2; namely 

(3.1) j n = mini j e J : ||/„ 2 (j) - f m (Z)||oo < MJ ^ VZ > j, I € J I, 



where M = M'^/ll/lloo V 1 with M' := M'(K), a constant that depends only 
on i^. We discuss in Remark 2 below the choice of M 1 as well as how one 
can circumvent having to know ||/||oo m practice. 

A remark on the choice of j n is in order. Our proofs will imply, for < 
t <r, the adaptive global (minimax-optimal) risk bound 

/log 71^ 

sup Esup\f ni (y,j n ) - f(y)\=0' 



f-\\fh,oo<L ym \\ n 

To make inferential use of this result we will use the estimator f ni (j n ) (in 
fact a slight modification of it) as the center of a confidence band for the 
unknown density / on the interval [a, b] . Under certain assumptions on / 
our confidence band will be shown to be both honest and adaptive for ar- 
bitrary bounded intervals [a, b] (although we prove our result, w.l.o.g., only 
for [a, b] = [0, 1]). If one starts with a fixed interval [a, b], one may alterna- 
tively try to choose the resolution level j n above depending only on values 
°f /n 2 (j);/n 2 (0 on [ a ^]- This is not our approach here, however; we want 
to construct a single estimator that is globally adaptive, and find honest 
confidence bands on arbitrary intervals [a, b] for it. The important question 
of spatial adaptation is not addressed in the present paper. 

3.2. The main assumptions. For the main theorem below, we will need 
some conditions that we state now. The first condition is stochastic in nature 
and is about an exact limit theorem for the maximum deviations of the 
centered linear estimator. Define 



(3.2) c(K) = Jsup [ K2(x,y)dy, 

y x Jr 

and let A(l) := A(l , K) , B {I) :=B(l,K) be real-valued functions defined on 
N, depending only on K and such that A(l) ~ I 1 ! 2 ~ B{1). For re, I G N, x E R, 
define 




m 

TT Slip 

1 ye [0,1] 



f ni (y,l)-Ef ni (y,l) 
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For F = [Fi, F2] with F\ < < 1 < F2 and 5, a > define further the class of 
densities 

V = V(a,D,8, F) 

(3.3) 

= |/:R->>R,j£/ = l,/>0on R,f > S on F, ||/|| a:0O < £>}. 

To avoid triviality we shall only consider combinations of a, -D, 5, F such that 
"D is nonempty, and given a,D we shall say that <5, .F are "admissible" if T> 
is nonempty. 

Condition 2. Assume that for every every < a < 00, < D < 

00, /„ G i7 n and every admissible i > 0, F we have, as n — > 00, that 

sup \T n (l n ,x,f,K)\ ->-0. 

/GD(a,Z),<5,F) 

Verifying this condition is a nontrivial problem in itself, and we discuss 
this in detail in Section 3.4. We also need the following condition on the 
underlying density. 

Condition 3. Suppose / € C*(R) for some t > and that there exist 
positive finite constants b\ < 62 and a positive integer jo such that for every 
integer j > j , 

b 1 2-i t <\\K j (f)-f\\ 0O <b 2 2-i t . 

Note that the upper bound is standard and can be shown to follow from 
/ € C*(M) for t < r and K satisfying Condition 1 [cf. Theorem 9.3 in Hardle 
et al. (1998)]. For the uniformity results below it is convenient to require 
the upper bound also in the boundary case t = r [which does not necessarily 
follow from just / S C r (M), but holds, for example, for r-times differen- 
tiable functions with r bounded derivatives; cf. Theorem 8.1 in Hardle et al. 
(1998)]. The lower bound on the error of approximation of / by Kj{f) is a 
crucial assumption, and we refer to Section 3.5 for a detailed discussion of 
this condition. 

In the construction of confidence intervals, it is well known [since Bickel 
and Rosenblatt (1973); see also Hall (1992) and Picard and Tribouley (2000)] 
that one should "undersmooth." In the classical case of convolution kernels 
this means that we should decrease the bandwidth 2~ Jn to 2~ 3n ~ Un and 
use the function f ni {v-,jn + u n ) as the center of the band where u n is some 
sequence of positive numbers. In the context of wavelets this means that we 
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should add a block of empirical wavelet coefficients at resolutions j n <j< 
jn + u n to our estimator so that the function 

jn+Un— 1 

(3-4) fn 1 (y,jn + U n )=f ni (y,j n )+ ^ ^Afe(l)^fc(y) 

l=~jn k 

is the center of the confidence band. 

Condition 4. Let u n be a sequence of positive integers such that 2 Un ~ 
(logn) 2 . 

3.3. The main result. Let j n be the data-driven resolution level from 
(3.1); recall the constants from Condition 2, and define 



/ 2i n+Un - a 

°n = \ , A n = A(j n + u n ), B n = B{j n + u n ). 

V n i 

If c(if) is as in (3.2), then the size of the band around f ni (y,jn + u n ) will 
be twice 



(3.5) s n (y, x) = a n c{K)\J f m (y,j n + u n )i^- + B n 

and we note that this quantity can be shown to be eventually positive for 
every but for fixed n we implicitly assume that x is large enough 

so that s n (y,x) > 0. We emphasize that s n (x) is completely data-driven 
[except for the dependence on ||/||oo through (3.1) discussed in Remark 2]. 
The confidence band we propose for / is 

C n (x,y) := [fmiyJn+Un) - s n (y,x), 

(3-6) 

UAyJn + Un) + s n (y,x)], x,y G R, 

and the probability of inferential interest is, for 

Pr f{f(y) G C n (x,y) for every y G [0, 1]}. 

As mentioned before, we restrict ourselves here to the interval [0, 1] , but any 
bounded set [a, b] is possible as long as / is bounded away from zero on a 
neighborhood of [a, b] . 

Our adaptation result will be shown to hold for densities satisfying Condi- 
tion 3 with t G (0, r] , where r is the regularity of the kernel K from Condition 
1. Moreover, the result will be uniform for t in any compact subset of (0,r]. 
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To describe exactly the set of densities over which our results hold uniformly, 
define first, for fixed <r] <r, <b < oo, < fei < &2 < °o and jo G N, 

V(ii,r,b,bi,b 2 ,jo) 

: = |J {/GC t (E):||/|| r?i00 <6,6 1 2^ i <||^(/)-/|| 00 

r/<t<r 

<b 2 2^yj>j }. 

This class is just the union over t G [rj, r] of functions that satisfy Condition 
3 for the given t (and constants b\,b 2 ,jo), and that is also contained in a 
fixed ball of C v (R) . We assume implicitly that b is large enough so that this 
class is nonempty. 

Let then V(r],b,5,F) be the set from (3.3) where 6, F are admissible, and 
define 

(3.7) V:=V{ri, r, b, h , b 2 , jo ,S,F)= V(j], r, b, b u b 2 , jo) D V{ V , b, 5, F) . 

This set simply consists of densities that are in V(r],r,b,bi,b 2 ,jo) and that 
are also bounded away from zero on F. It is easy to see that this set is 
nonempty, and we shall discuss this class of densities in detail in Section 3.5. 
Clearly, for every / G V there exists a unique t := t{f) for which Condition 
3 is satisfied. 



Theorem 1. Let f ni (y,l) be the estimator from (2.1) with K satisfying 
Condition 1 for some r > 1. Let further j n be defined as in (3.1) , and let c{K) 
be as in (3.2). Assume that Conditions 2 and 4 are satisfied for f ni (yj) and 
u n , respectively. Then we have for every b > 0, < b\ < b 2 < oo, jo G 

N, < rj < r and every admissible 5 > 0,F = [F\, F 2 ] satisfying F\ < < 1 < 
F 2 that 



sup 

f£V(Ti,r,b,bi,b2,jo,S,F) 



Pv f < A 



sup 

ye[o,i] 



f ni (y,j n + un) - f(y) 



c(K)a n Jf ni (y,j n + u n ) 



B, 



< x 



converges to as n — > oo . Furthermore, for every e > there exists a con- 
stant L such that, for every n G N, (t = t(f)) 

(3.8) sup Pr/{<7 n > in"* /(2i+1) (logn)- 1 / 2 ^ 1 ^"' 1 / 2 } < e. 

feV{v,r,b,bl,b2,jo,S,F) 



In the proof, which is given in Section 4.4.1, we will show that 

0< mf Jn /2 {yJn + U n ) < SUp f^ 2 {y,jn + U n ) < L' 

ye[o,i] ye[o,i] 
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for some constant L' on sets of probability approaching one, and the fraction 
in the above theorem has to be understood accordingly. Moreover, A(l) ~ \fl 
(Condition 2), Condition 4 and j n G J imply 

A„, = A n (j n + u n ) ~ \/iogn, 

and likewise for B n . Combining this with Condition 4 we have the following: 

Corollary 1. Let the assumptions of Theorem 1 be satisfied; let C n (y, x) 
be the confidence band from (3.6), and let V be as in (3.7). Then, for every 

x g R, 

sup|Pr f {f(y)eC n (y,x) Vy G [0, 1]} - e" 6 "" | 
fev 

converges to zero as n— >oo. Furthermore, this confidence band is adaptive: 
If 2s n (y,x) is the length of C n (y,x) at y G [0, 1], then for every e > there 
exists a constant L such that, for every n G N and every (t = t(f)) 

(3.9) supPr/j sup s n (y,x)>L(n/\ogn)- t/{2t+1) 2 Un/2 }<e. 
fev l 2/e[o,i] J 

Remark 2. The definition of j n in (3.1) involves two "unknown" quan- 
tities. The first is ||/||oo, and in practice this can be replaced, for instance, by 
the estimate \\f n2 (j m ax)||oo- All proofs go through for this data-driven choice 
as well [arguing as in the proof of Theorem 2 of Gine and Nickl (2009a)], 
but we abstain from this to reduce technicalities. Another question is how to 
select the constant M' . A concrete choice can be obtained from tracking the 
constants in the proof of Lemma 2. To obtain good constants one may use 
Rademacher-symmetrization in a similar vein as in Gine and Nickl (2010). 

3.4. Condition 2 and the asymptotic distributions of suprema of linear 
density estimators. Since the results in this section only involve the sample 
Si, we set n\ = n. The prototypical result required in Condition 2 [with- 
out uniformity in D(a,D,d,F)] is for convolution kernel density estimators, 
and due to Bickel and Rosenblatt (1973). Their conditions are too strin- 
gent for our "adaptive" framework, but some refined methods from Gine, 
Koltchinskii and Sakhanenko (2004) can be used to verify Condition 2. 

Proposition 1. IfK-.M— >M satisfies Condition 1(a), is supported in 
[—1,1] and is twice continuously differ entiable on M, then the kernel density 
estimator f n (y,l) from (2.1) satisfies Condition 2 with A(l) = y / 2(log2)7, 
B(l) defined in (4.22), and c(K) = \\K\\ 2 . 
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Proof. Use Proposition 7 below. □ 

One is next led to ask whether an analogue of the classical Bickel-Rosenblatt 
theorem can be proved for the wavelet case as well. This problem has no 
simple solution in general; the easiest case being that of Haar wavelets which 
was already considered in Smirnov (1950). Let f n (y,l) be as in (2.1) where 
<j> = l[o,i) which satisfies Condition 1(b) with r = 1. 

Proposition 2. The Haar-wavelet density estimator satisfies Condition 
2 with A(l) and B{1) defined in (4-21) and with c(K) = 1. 

Proof. Use Proposition 6 below. □ 

The Haar-wavelet allows one to adapt only up to smoothness one, so it is 
of interest to verify Condition 2 for other wavelets that satisfy Condition 1 
for r > 2. On the positive side we prove in Section 4.1 a Gaussian reduction 
argument for general wavelet estimators similar to the convolution kernel 
case. The resulting Gaussian processes are given by the stochastic integrals 



where W is Brownian motion and where K(x,y) = X^fc<A( x ~~ k)(f)(y — k) 
is the wavelet projection kernel. On the negative side, these processes are 
nonstationary and therefore we cannot use the classical extreme value theory 
for stationary Gaussian processes [Leadbetter et al. (1983)] as Bickel and 
Rosenblatt (1973) and Gine, Koltchinskii and Sakhanenko (2004) did. 

The theory for nonstationary Gaussian processes is more involved (see 
Section 4.2.3). In Section 4.3 we will prove that the wavelet density es- 
timators based on Battle-Lemarie wavelets of degree r < 4 (which satisfy 
condition 1 for this r) do satisfy Condition 2. We believe that the condition 
holds for Battle-Lemarie wavelets of any degree, but our proof depends on 
specific computations that increase in complexity with the degree, and which 
we complete only for r < 4. See Remark 3 after the proof of Proposition 9 
below for more discussion. The case r = 1 (Haar wavelet) is not repeated. 

Proposition 3. The wavelet density estimator f n (y,l) from (2.1) based 
on Battle-Lemarie wavelets <f) r with r £ [2,4] satisfies Condition 2 with A(l), 
B(l) and c(K) = a r as in Propositions 8 and 9. 

3.5. Condition 3 and the class V . As was mentioned in the introduction, 
the natural classes for adaptive density estimation in sup-norm loss are balls 
£(t, L) of radius L in C* (M), where < t < r (including the case t = r if the 
upper bound in Condition 3 holds with t = r). Theorem 1 does not hold for 
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Uo<t<r ^(*>-^)> but only for V, and we want to discuss in detail this class in 
order to understand the restrictions imposed, mostly Condition 3. On the 
one hand, we recall from the introduction that a honest adaptive confidence 
band cannot exist for the full class |J 0<t<r L). On the other hand we 
shall show below that (i) any r-times differentiable density with bounded 
continuous derivatives is contained in V (for some 61,62), so our confidence 
band is valid, and shrinks at rate n~ T ^ 2r+l ^ (up to a logarithmic term) for 
very smooth densities; (ii) the minimax rate of convergence over VC\ £(t, L) 
is the same as the one over E(i, L), and (hi) one cannot "generically" improve 
upon the class V in Theorem 1, at least in the following sense: The set of 
densities that are contained in L) but not in V contains no given ball 
of E(t,L). Exact statements require some more careful discussion. 

Let us first remark that the mild requirement that the density / is bounded 
away from zero on an interval whose interior contains [0, 1] is helpful in the 
verification of Condition 2. This condition could be avoided by using tech- 
niques from Gine, Koltchinskii and Sakhanenko (2004) but at the expense 
of considerably more technical proofs (that also lead to modified results). 

The crucial restriction that we impose is Condition 3. Verification of the 
upper bound in that condition is standard and was already discussed im- 
mediately after Condition 3. The delicate part of the condition is the lower 
bound. We start with an informal discussion in the case where K is a con- 
volution kernel, and taking K = l[_i/2,i/2] f° r simplicity (so that r = 2). In 
this case the quantity in this condition reduces to 



(3.10) ||K i (/)-/|| 0O = sup 



1/2 

(f(x-u2-i)-f(x))du 

1/2 



First, if the density / is infinitely differentiable with bounded continuous 
derivatives, then this quantity is of order 

.1/2 



2~ 2j ' 1 sup 



+ o(2~ 2j )>b 1 2' 2j 



/ u 2 D 2 f(x) du 
J-1/2 

for some bi > and j large enough since no such density on R can have a 
second derivative that is everywhere zero. The constant b\ is even bounded 
away from zero uniformly in the set of all twice differentiable densities that 
are supported in a fixed compact interval [a, b] [as is easily seen by expanding 
= /(a) up to second-order around a point of maximum xq, using also 
f{xo) > (b — a)" 1 ]. If K is a kernel of order r but not of order r + 1, then 
the same lower bound holds with 2~ 2 - 7 replaced by 2~ r - ? , and we see that 
Condition 3 is then always satisfied with t = r for very smooth densities f . 
Similar remarks apply to wavelets. 

Hence we have to consider the case where / is not very smooth, say 
/ £ C (R), but not in C <+7 (M) for any 7 > 0. For a given function /, one can 
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call, in slight abuse of terminology, 



t(f) := sup{t :/GC* 



(R)} 



its "Holder exponent." This exponent is generally not attained for a given 
function /, but before we address this issue let us continue with some special 
cases. Suppose for instance / is infinitely differentiable except at xq where / 
behaves locally as \x - x \ so that / G C 1 (1R) but / ^ C 1+7 (R) for any 7 > 0. 
This means that we would like to verify Condition 3 with t = l. Indeed the 
integrand in (3.10), for x = xo, equals 2~ J |u| so that again \\Kj(f) — /Hoc > 
bi2~ J . More generally we can rewrite the quantity in (3.10) as 



Now intuitively we would expect that / in C* (R) but not in C* +7 (R) for any 
7 > precisely means that / attains the Holder exponent t and that, for 
some xq G R, 



is bounded away from zero or even has a nonzero limit as v — > [where 
has to be replaced by sign(-y)|f|* in the denominator]. Unfortunately 
this reasoning is too naive, and it is not difficult to see that C*(R) contains 
functions that do not attain their Holder exponent; in particular there exist 
functions in C*(R) \ (|J s>i C s (R)) for which the Holder exponent is not t. 
However, one can show that such a pathology cannot occur for "quasi-every" 
function in C*(R). 

To be precise, let us recall that a property holds for "quasi-every" element 
in a metric space if the set of elements in this space that do not satisfy this 
property is nowhere dense (so in particular "meagre" in the sense of Baire 
categories). Recall further that a subset F of a metric space is nowhere 
dense if the interior of its closure is empty, so in particular F contains no 
open subset. For example, a classical result of Banach (1931) is that "quasi- 
every" function in the space of continuous functions on [0, 1] (equipped with 
the sup-norm) is nowhere differentiable. This is sensible: for instance, any 
bounded set of equicontinuous functions is norm-compact in this space, and 
compact sets in infinite-dimensional normed linear spaces are always meagre. 

Inspired by these ideas and recent results of Jaffard (1997, 2000), we 
can now state the main result of this subsection which says that the set of 
functions in the Banach space C*(R) that do not satisfy the lower bound in 
Condition 3 is nowhere dense in this space. 




f(x -v) - f(x ) 



v 
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Proposition 4. For f e C*(R), let Kj(f) be as in (2.2) where K(x,y) = 
^2 k 4>(x — k)<j){y — k) and where <f> is (r — l)-regular, r — 1 > t. Then the set 
of functions 

Mt = {/ G C*(R) : there do not exist b± > 

and jo £ N s.t. \\Kj{f) - /(U > 6x2^'* Vj > j } 

is nowhere dense in (the norm-topology of) the Banach space C*(R). 

The proof can be found in Section 4.4.2. Note further that the proofs for 
convolution kernels and also for r — 1 < t < r are similar but more technical. 

The question now arises as to how exactly this result applies to the set 
of densities V . A first somewhat trivial but necessary observation is that V 
is nonempty; in the convolution kernel case this is already clear from the 
discussion before Proposition 4. Moreover, in the case of the Haar wavelet, 
it is not difficult to prove directly that if the density / is such that, for some 
x G R, 

f(x -v) -f{x ) 
sign(u)|t;| i 

has a nonnegative limit as v — > 0, D say, then, for k$ such that xq € (ko/2 l , (&o + 
l)/2'] we have |/9 /fco (/)| > 2~ l ( t+1 /V (D - e)(l + t)-\l - 2~') for every e > 
and I = 1(e) large enough, which verifies the lower bound in Condition 3 by 
(4.45) below. In particular every differentiable density / satisfies this con- 
dition for the Haar wavelet. For completeness we show that V is nonempty 
also for more general wavelet bases (see Section 4.4.2 below). In fact we shall 
see there that V is quite rich; small local modifications of arbitrary densities 
/ G C*(R) are contained in V. 

To return to the interpretation of Proposition 4, let £(i,L) be a ball in 
C*(R), and define the subset of densities that are bounded away from zero 
on F, 

S(t) :=S(i,I)n%L,(5,F). 

It is natural to consider the trace (or relative) topology on £(£) as a subset 
of the Banach space C*(R). Proposition 4 then implies that the set of densi- 
ties that are in E(i) but not in V(r],r,b,bi,b2,S,F) for any 61,62 — so those 
functions over which an adaptive sup-norm risk bound can be established but 
for which our adaptive confidence band is not necessarily valid — is nowhere 
dense in the trace topology. If Theorem 1 is interpreted as a pointwise (in 
/) result, then these findings imply that there exists no (relatively) open set 
in £(£) for which Condition 3 does not hold, and our adaptation result holds 
for "quasi-every" density in S(t). Clearly, to obtain uniformity of the limit 
in Theorem 1 , we have to fix a value of 61 , but given a (relatively) open set 
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O in we can always choose b\ so small that O intersects V and hence 
is not contained in E(t) \ V . 

We finally remark that the minimax sup-norm risk over E(t) is, for every 
n, the same as the one over £(i]n'P: This follows from the fact that PnS(t) 
is || • Hoc-dense (as b\ \ 0) in that the mapping / i— > Ef\\T n — /||oo is 

continuous from (S(i), || • ||oo) to R if T n is any measurable function of the 
sample which satisfies, for some fixed constant c, ||7^i||oo < c with probability 
one and that estimators T n that do not satisfy the last property for any c 
can be neglected in the minimax risk. 

4. Proofs. The proofs are organized into several parts. We start with 
some probabilistic results (Sections 4.1-4.3) that are central to verifying 
Condition 2. The statistically more relevant proofs are in Section 4.4 and 
depend on these probabilistic results only through Condition 2, so they can 
be read independently. 

4.1. A Gaussian reduction for maximal deviations of linear density esti- 
mators. In what follows, given a metric space (T, d) and e > 0, the covering 
number N(T,d,e) denotes the smallest possible cardinality of any covering 
of (T,d) by closed d-balls of radius at most e. Its logarithm is referred to as 
the metric entropy of (T,d). We also recall that a process Y(t) on a metric 
space (T, d) is said to be sample continuous if there exists a version of the 
process whose sample paths are all bounded and uniformly continuous. For 
Gaussian processes Y(t), t G T, unless specified otherwise, the distance d is 
automatically taken to be the one provided by the process itself, 

d(s,t) = (E(Y(t)-Y(s))Y 2 - 
We let K : M. 2 i— > M be a measurable function satisfying: 

(Kl) K is symmetric in its arguments, bounded, and for all s G R, K (s, t) 
is right or left continuous in t for every t € R; 

(K2) sup t \\K(t, -)\\v := ||-K"||v < oo where || • ||y denotes the total varia- 
tion norm on R, K(t, — oo) = for all t; 

(K3) there is a bounded, nonincreasing, exponentially decaying function 
$ : R+ U {0} H> R+ U {0} such that 

\K(x,y)\<*(\x-y\); 

(K4) for all A > 1, the covering numbers N(X[Fi,F2],d,e) of the intervals 
[AFi,AF2] for the pseudo-distance d(s,t) = (f R (K(t,u) — K(s,u)) 2 du) 1 / 2 ad- 
mit the bounds 

A'X" 2 

N{\[F x ,F 2 ld,e)< — 

for some A',Vi < oo independent of e,A, and these bounds are valid for all 
positive e not exceeding the d diameter of [A-F^Ai^], and 
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(K5) there exist A, v finite such that if K, = {K(2 j t,2 j (■)) :t el, j G 
N U {0}} and if Q is the set of Borel probability measures on R, then 



(4.1) supiV(/C,L 2 (Q),e) < 

for < e < ll-fsTlloo. 



AX 



Let I = [a,b]. Given a real sequence j n —too, define on I the Gaussian 
processes 

(4.2) Y n (t) = 2 jn/2 [ K(2 j "t,2 j "s)dW(s)= [ K(2 jn t,u) dW(u), 



where W is standard Brownian motion. It will often be convenient to rewrite 
Y n (t) as Y n (t) = Y{2H) where 

/oo 
K(t,s)dW(s). 
-oo 

Note also that condition (K4) ensures that the processes Y n are sample 
continuous; for u,v £ I, 

d 2 n {u,v):=E{Y n (u)-Y n {v)) 2 

(4-4) 

= / (K(2 jn u,s)-K(2 jn v,s)) 2 ds<d 2 (2 jn u,2 jn v) 
Jr 

so that N(I,d n ,e) < N(2P n I,d,e) and it follows from condition (K4) that 
the square root of the metric entropy of / with respect to the distance d n 
is integrable at zero, and hence the claim is an immediate consequence of 
Dudley's theorem [Theorem 2.6.1 in Dudley (1999)]. In particular, if we 
still denote a sample continuous version of Y n by Y n , the norms ||^||/ = 
sup tg/ l^nWI are proper random variables. 
Let now 

(4.5) F n = |J Tl Tf = {K(2H2^-)/^J(i}:teI}. 

fev 

Given / & T>, let X{ be i.i.d. with law dPf(t) := f(t) dt, and let, as usual, 

1 n 

be the empirical processes based on the sequence A^. Note that by the 
properties of K and /, the supremum in \\vn\\ T s is countable and hence 
measurable. 
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The goal of this subsection is to prove the following proposition, in the 
spirit of Bickel and Rosenblatt (1973) and Gine, Koltchinskii and Sakha- 
nenko (2004), and the proof adapts techniques from the last reference to the 
present situation. In what follows, Prj will still denote the product probabil- 
ity Pj, but the symbol Pr will denote the probability measure determining 
the laws of all relevant other random variables (such as Y n , and random 
variables constructed in the Gaussian coupling in the proof of Proposition 
5 below). 

Proposition 5. Let I = [a,b], let K be a function satisfying conditions 
(K1)-(K5) above, and let j n — > oo as n — > oo. Let {A n } and {B n } be numer- 
ical sequences such that A n — > oo and 



(4.6) A n = o — f—A2^A 2 ^ 

V2W 2 logn y/j n 

for some < a < 1. Assume that there exists a random variable Z with 
continuous distribution such that 

(4.7) lim Yv{A n (\\Y n \\ I -B n )<x} = Vv{Z <x\, iel, 

where the processes Y n are defined by (4-2). Let V(a, D,S, F) be as in (3.3) 
for the given a, some < D < oo, and admissible 5 > 0,F = [iq,^] D I. 
Define, for each f E T>, Tn as in (4-5), and let further Vn, n £ N, be the 
empirical processes based on the variables Xi. Then, for all x S R, 

(4.8) lim sup |P r/ {4i(2W 2 ||i^|| f - B n ) < x} - Pr{Z < x}\ = 0. 



fev 



Proof. Step 1: By Theorem 3 in Komlos, Major and Tusnady (1975), 
there is a probability space with a sequence of i.i.d. uniform on [0, 1] 
random variables and a sequence of Brownian motions W n defined on it such 
that, setting 

1 n 



«n(i) = ^2j<yo,*]-t) 

=1 



and W°{t) = W n (t) - tW n {l), then, 

(4.9) Prjllon- W°|| [0] i] > X+ ^l° gn | < Ae~ ex , 0<x<oo,nGN, 

for some universal finite, positive constants C, A, 6. 

Define new random variables Xj = Fj^ 1 ^), where FJ 1 is the left continu- 
ous generalized inverse of the distribution function Fj of /, right continuous 
at zero. For every / G T> the variables Xi are i.i.d. with law Pr, and we 
denote by z/„ the associated empirical process. By (K2) and / > 5 on J, 
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the functions in F n have total variation norm not exceeding ||_K"||y/\/£, and 
since FJ l is monotone, it follows that the same bound on the total variation 
norm (for functions on [0, 1]) holds for all the functions in the classes 

Pl = {hoFj l :heFl}, feV,nen. 

Moreover, if g is nonincreasing on [0, 1] with g(0) = 1 and g(l) = 0, then g is 
the pointwise nondecreasing limit — and by dominated convergence, also the 
limit in -L2QO, 1]) — of convex combinations of indicators -f[o,t]> < t < 1. So 
by (K2) both a n and W n extend from indicator functions ho a t° functions in 

Fh by linearity and continuity [see, e.g., Dudley (1985), Theorems 5.1-5.3], 
and so does W°. We conclude that, for all / G V, 

\\a n -W°\\ n < \\K\\ v 5- l l 2 \\a n -W°\\ m , 

and, writing G° nJ (g) = W°{g o FT 1 ) for g G Fl, that E(G° nJ (g)G° nJ (g)) = 
Pf(gg) — (Pfg)(Pfg), i.e., G° n j is a (sample continuous) version of the Pf- 
Brownian bridge. Since, furthermore, 

a n {g o FJ 1 ) = D f n {g) 

by construction, (4.9) gives 

supPr<^ \\K-G } \\ f > -= \ 

= sup Pr<^ \\a n - W n || fS > = 

f&v I n V n 

<?rl\\a n -W:\\ m > X -±^\<Ae^ 



for all x > and n G N. Taking x = (C — C) logn for some C > C in this 
inequality, we have 

s, lp Pr { ^ - g^/IIj./ > Hk^W 

/6Z> I n Vn2-J« J 

(4.10) 

A 

< 



n (C'-C)8 ■ 

In particular, if 

( 4 - n ) A - = °(wi^- 

then (4.10) implies that there exists a sequence — >• such that 
(4.12) limsu P Pr{^ ri 2^/ 2 ||^-G° f || , > <} = 0. 
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Consider next the processes G n j(g) = W n (goFj 1 ), g G Fl which are sample 
continuous versions of the Pj-Brownian motion Wp f since 

E(W n (g o Fj x )W n ($ o FT 1 )) = C g o Fj l {x)g o Fj l (x) dx 

J 

g(y)g(y)f(y)dy. 



Since W°( 5 o F" 1 ) = W n ( ff o F" 1 ) - (/^ o cft)W n (l), and, since, 

by (K3), 



(4.13) sup 



/ goF~ x (t)dt = sup |P /5 | <<5~ 1/2 ||$||i2~ J 



it follows that if 

(4.14) A n = o(2^ 2 ), 

then we can replace G° n by G n in (4.12); that is, there exists — )• such 
that 

(4.15) limsu P Pr{A n 2^/ 2 ||i> i {-G nJ |U / > 4J = 0. 

[Note that by the results of Dudley (1985) alluded to above, for all n and /, 
the process Wp. (g), g G Fl , is sample continuous (hence sample bounded) .] 
Step 2: To compare G n j on Fl with Y n we must couple in the right way 
sample continuous versions of both processes. Since the functions in Fn are 
parametrized by t G I, we will write (in slight abuse of notation) G n j(t), 
t G I, for G nJ {g t ), g t (-) = K(2^t,2^-)/y r f(t) G F-L First we observe that 
the process, 



w(K(v»t,v»-Wf(-)/f(t)), t g /, 

where W is Brownian motion acting on functions as described in step 1, is a 
version of G n j (both processes have the same covariance). Next we observe 
that the set Q n defined by 



G n = {2 jn ' 2 K(2H,2^-),K(2H,2i"-Wf(-)/f(t)-.te 1} 

is a GC subset of A) where A is Lebesgue measure; this follows from the 
entropy bounds (K4) and (K5) and the results in Dudley (1999), Sections 
2.5 and 2.6, in particular Theorem 2.6.1. Hence, the restriction to Q n of 
the isonormal process of I^R, A), which we write as g 1— >■ J K gdW = W(g), 
admits a version with bounded uniformly continuous sample paths [for the 
L2QSL, A) distance]. We call G n j(t) and Y n (t) the restrictions of this process 
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to the sets {K(2^t,2^-)y/ f(-)/f(t) :t G /} and {2*»/ 2 .fir(2**t, 2?»-) :t € I}, 
respectively. They are versions of G n j and Y n , respectively, and, as we see 
next, we can control the supnorm of their difference. Set 

Z nJ (t) = 2^ 2 G n j(t)-Y n (t) 



= 2 jn ' 2 J K(?n, 2 jn s) i^jj^ - ij dW(s), t e I. 

We have for u,v £ I, 

d ZnJ (u,v) := (E(Z nJ (u) - Z nJ (v)f) 1/2 

<5^ 2 \\K{2^u,.)-K(2^v,-)\\ L2{Pf) 

+ WKiur^iu) - r l/2 {v)\ + d n (u, v), 

where d n (u,v) = d(2 jn u,2 jn v) [cf. (4.4)], and, using (K4), (K5) and that 
$ < f( u ) < D for all u G /, it is easy to show that the covering numbers of / 
for this distance are bounded by 



t'4 



(4.16) N(I,d Znf ,e)<B2 v ^"/e 

for every (small) e > and constants B, vs,v^ independent of n and /. Note 
next that by Remark 1, for every / G T> there exists L = L(T>) and c = c(T>) 
such that H/lloo < L and the a-H61der constant of / is at most c. Hence, for 

tei, 



(Jf(t - - vTW) 2 < LI{\u\ > 52 jn ) + c 2 2~ 2ajn I(\u\ < 52 Jn ), 

and we obtain for all t £ I, 



E(Z nJ (t)) 2 =2 jn [ K 2 (2 jn t,2 jn i 



2 



1 1 ds 



V /(*) 

<2^d~ 1 I K 2 {2^t,2^s){^/J{s)- y/f(fi) 2 ds 



(4.17) 

-C5- 1 I <5> 2 {u){Jf{t-2-^u)- vW)fdu 

< 5 _1 L||$||i$(52 J ") + 5- 1 \\<S>\\lc 2 2- 2aj " <D\2~ 2ajn , 

where D\ is a constant that does not depend on n nor /. That is, the 
diameter of / for the ^-distance induced by the process Z n j is at most 
2Di2~ a;,n . Hence, Dudley's entropy bound in expectation form [e.g., de la 
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Pefia and Gine (1999), Corollary 5.1.6], (4.16) and (4.17) give 



. /9 ~ ~ I mv3jn 

Esup\2^ 2 G nJ (t) -Y n (t)\ < D 1 2~ a ^ + / Ji g__tfe 
tei Jo » e" 4 



rO V Jn^ 

with unspecified constants independent of / € P and n. So, if, besides (4.11) 
and (4.14), the sequence {A n } satisfies 



A n = o(2 a ^/y/j n ) 
[hence, if {A n } satisfies (4.6)], then there exists e n — > such that 
(4.18) lim supPr{Aj2 W2 G ni/ -y„||/>£ n }=0. 

Step 3: We finally combine the bounds obtained. Clearly ||G n j||/ has the 
same probability law as \\G n fll-r-/, and likewise II 1^11/ has the same law as 
\\Y n \\i. Therefore, under the hypotheses of the proposition, we have, for all 
/ € T> and x n — > x, x £ R, 

pr{A»(||y«||/ - B n ) <x n - e„} - Pr{Z < x}] 

- supPr{A n \\2^/ 2 G n . f - Y n \\j > e n ] 
fev 

< Pr{A n (2^/ 2 ||G ni/ |U/ - B n ) < x n } - Pr{Z < x} 

< [Pr{A n ( ||Y„||/ - B n ) <x n + e n } - Pr{Z < x}] 

+ supPr{^ n ||2^/2(5 ni/ _ Yn \\j > e n }. 
fev 

The leftmost and rightmost sides of this inequality do not depend on / € T> 
and tend to zero by (4.7), the continuity of the probability law of Z and 

(4.18) . Thus we have 

(4.19) lim sup | Pr{A n {2^ 2 \\G nJ \\ T f - B n ) < x n } - Pr{Z < x}| = 

n ^°°ferl ' n 

for any sequence x n — > x, any i£R. Similarly, since the random variables 
f f 

\\Pn\\ T f and \\Un llj-f have the same law, we have, for any x € R, 

n. n. 



[Pr{A n (2^ 2 \\G nJ \\^f -B n )<x- e'J - Pv{Z < x}\ 

- sup Pr{ A n 2^ 1 2 1 1 v{ - G nJ | | / > e' n } 
fev ' ~ tn 

< Pr / { J 4 n (2 J " /2 ||^|U/ - B n ) <x}- Pr{Z < x} 
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< [Pr{A n (2^ 2 \\G n j\\^f -B n )<x + e'J - Vt{Z < x}] 

+ snpPr{A n 2 jn / 2 \\i>l - G nJ \\ f > e'J 
fev ^ n 

which, by (4.15) and by (4.19) with x n = x ± e' n , gives (4.8). □ 

Condition (K3) is only used in the equation above (4.13) and in (4.17); 
therefore it can be relaxed to the following: there is $ measurable, bounded 
and satisfying that, for some yo and n > and all y > yo, sup x>J/ $(x) < 
y" 1 " 77 , such that \K(x,y)\ is dominated by <3?(|x — y\). 

4.2. Examples and some first limit theorems for suprema of certain Gaus- 
sian processes. 

4.2.1. Haar wavelets. The projection kernel corresponding to the Haar 
wavelet is 

(4.20) K(x, y) = X)/[o,i)(s " k)I m (y — k) = I([x] = [y]). 

k&L 

It obviously satisfies conditions (K1)-(K3) (\\K(t, -)||y = 2, $(|«|) = I(\u\ < 
1)). Moreover, a?(x,y) = f R (K(x,u) - K(y,u)) 2 du = if [x] = [y] and 2 
otherwise so that 

N(X[F\, F 2 ] ,d,s)< N(X[Fi ,F 2 ],d,0)<X{F 2 -F 1 ) + 2< 2A ( F 2-^i) + 4 

for < e < 2 (note that 2 is an upper bound for the d-diameter of any set 
of real numbers) so that (K4) holds. Condition (K5) follows from Lemma 2 
in Gine and Nickl (2009b). 

So Proposition 5 applies and we are led to consider the process [see (4.3)], 

rk+l 

r(t) = £l(i€[M + l)) / dW{s)=Y,I(te[k,k + l))g kf 

k&L Jk k£Z 

where g k are i.i.d. N(0, 1), and therefore, taking / = [0, 1], 
sup \Y n (t)\ = sup \Y(u)\ = max \g k \. 

0<t<l 0<u<2i™ 0<fc<2J" 

Now Theorem 1.5.3 (and Theorem 1.8.3) in Leadbetter et al. (1983) gives 
Pr\A n ( sup \Y n (t)\ - B n ) < x\ -> e~ e ~ x for all 

where A n = A(j n ), B n = B(j n ) and 

(4.21) AW-PO^M*. B(0^(0- '° g ' + 2 '° g ( ;;'° 82) . 
Combining this with Proposition 5 we have, recalling the set T> from (3.3). 
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Proposition 6. Let V = V(a, D, 5, F) for some < a < 1, < D <oo, 
and where 5,F are admissible. If j n — > oo as n — >■ oo satisfies j n ^ Jn = o(n/ 
(logn) 2 ) and if f n := f n {-,jn) is the Haar wavelet estimator from (2.1) with 
(j) = l[o,i)> then 



sup 

fev 



Pi f l A n ( Vn2~j™ 



fn - Ef n 



Vf 



[0,1] 



B„. < X 







for all x G 



as n —> oo where A n and B n are as above (4-21). 



4.2.2. Convolution kernels. If K is a real-valued function with bounded 
support, and is symmetric and Lipschitz continuous, then the kernel K(x, y) : = 
K(x — y) satisfies conditions (K1)-(K4) with $ = K and d(s,t) proportional 
to \s — 1\. Condition (K5) is proved in Nolan and Pollard (1987). [These are 
not the only convolution kernels satisfying (K1)-(K5) and, for instance, the 
Gaussian kernel also satisfies theses conditions.] 

Assume in what follows that K is bounded, symmetric, supported by 
[—1,1] and twice continuously differentiable on M. Writing Y n (t) = Y(2 Jn i) 
with Y as in (4.3) we have 

sup \Y n (t)\ = sup \Y(t)\. 

te[o,i] o<t<2J" 

In this case, Y(t) = J R K(t — s) dW(s) is a stationary Gaussian process with 
covariance 

r(t) := E(Y(t)Y (0)) = I K(t + u)K{u) du = \\K\\\ - Ct 2 + o(t 2 ), 
Jr 

where C = — 2" 1 J R K(u)K" \u) du>0 (by integration by parts), and r(t) = 
for \t\ > 2. Set Y = Y/\\K\\ 2 and C = C/\\K\\l. We apply Theorem 8.2.7 in 
Leadbetter et al. (1983), in its version for absolute values (Corollary 11.1.6 
in the same reference) with 



(4.22) 
One has 



B(l) = V2(log2)Z + 



log V 2C — log 7T 



lim Pr(v2(log2)j n ( sup \Y(t)\/\\K\\ 2 - B(j n )) < x) 



x G 



which, combined with Proposition 5, yields the following: 
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Proposition 7. If K :R i— >■ R is bounded, symmetric, supported by [— 1, 1] 
and twice continuously differentiate, V and j n are as in Corollary 6, B{1) 
is as in (4-22) and if f n := f n (y,jn) is the kernel estimator from (2.1), then, 
as n — > oo, 



sup 

fev 



PrJ V2(log2)j n ( Vn2"i" 



/n - Ef n 



\KhVf 



- B(j n ) )<x}-e 

[0,1] 



-e 



^0 

/or x 6 R. 



4.2.3. General wavelet bases. Let K(x,y) = Ylk^^ ~ k)4>{y ~ k) be a 
general wavelet projection kernel with scaling function 0. Assuming the 
conditions of Proposition 5 are verified for the moment (see below for ex- 
amples), we are led to consider the distributions of maxima over increasing 
intervals of the process 



Y(t)= I K(t,u)dW{u) = ^(j){t-k) I 



{s-k)dW(s), 



where W is Brownian motion. Since the functions 0(- — k) are orthonormal 
in L2(R); we can write this process as 

(4.23) Y(t) = <t>{t - k)g_ k = ]T </>(t + k)g k , 

k&L keZ 

where = / K (j>(u + k) dW{u) is a sequence of i.i.d. standard normal random 
variables. 

The process Y(t) is not stationary in general. However, the covariance of 
this process, which is given by 

(4.24) r(t, t + v):= EY(t)Y(t + v) = </>(t - k)(j){t + v-k) 

k 

is, for each t)6t, periodic with period one, and so is its variance function, 

(4.25) a 2 {t):=EY 2 {t) = E{y^(j ) {t-k)g h \ =^> 2 (t - k). 

^ k ' k 

Processes Y(t),t £ R, whose covariance function t \— > r(t, t + v) is periodic in 
t for every «6E, with a period independent of v, are called cyclostationary. 
So, although Y is not stationary, it is cyclostationary with period one. 

While the asymptotic distributional theory for maxima of nonstationary 
Gaussian processes is not as complete as for stationary processes, there are in 
particular some results for "cyclostationary processes." These results involve 
a careful analysis of the variance and covariance functions, and this requires 
a case-by-case treatment of the wavelet basis in question. For Battle-Lemarie 
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wavelets we can carry this through and will prove the relevant limit theorem 
for their associated Gaussian processes Y in the next section. 

Particularly interesting are the compactly supported wavelets, such as the 
Daubechies family. The Daubechies scaling functions <f> = 0jy [ see Daubechies 
(1992) and also Chapter 7 in Hardle et al. (1998)] have compact support and 
are differentiable for N > 3. The associated projection kernel K obviously 
satisfies (Kl) and (K2); it satisfies (K3), as shown, for example, in Hardle 
et al. (1998), Lemmas 8.5 and 8.6; (K5) is proved in Lemma 2 in Gine and 
Nickl (2009b); and regarding (K4) we note that, by orthonormality, 



/ (K(x, u) - K(y, u)f du = Y^i^x - k) - 4>{y - k)f 



(4.26) 



< 2 



\x-y\, 



a distance which satisfies (K4), since ||^fcl^(' ~~ A:) | ||oo < oo (e.g., Hardle 
et al. (1998), Lemma 8.5). So the Gaussian reduction in Proposition 5 ap- 
plies to Daubechies-wavelets for N > 3, and therefore it remains to prove a 
limit theorem for the process Y(t) = J„K(t,u) dW(u). The covariance func- 
tion of Y in the case of Daubechies wavelets seems difficult to analyze and 
we do not know how to derive an exact distributional limit theorem for 
maxg <i<2 jn |y(*)l in this case. We conjecture that these limits exist and are 
similar to the Battle-Lemarie cases considered in the next subsection. 



4.3. The limit theorem for Battle-Lemarie wavelets. 



4.3.1. Battle-Lemarie wavelets. Let 

N r (x)=L [0 , 1) *---^*L [0<1) (x) 

be the S-spline of order r, r G N. The scaling function cp r that generates the 
Battle-Lemarie wavelet basis admits a unique representation, 

(4.27) r (x) = Y,a i k ) N r (x-k), 

fcez 

where the sequence of coefficients ct^ has exponential decay as \k\ — > oo 
for all r [see Daubechies (1992), Corollary 4.5.2, where one can also find 
an explicit definition of these coefficients]. For r = 1, 0i is the Haar scal- 
ing function which has already been considered. For r > 1, the function cj) r 
is Lipschitz and, although it does not have bounded support, it decreases 
exponentially. In fact, as is easily seen, <j) is a (r — l)-regular wavelet basis 
(cf. prior to Definition 1). 



28 



E. GINE AND R. NICKL 



As a first step we verify the conditions of Proposition 5 for the kernel 
K{x,y). Condition (Kl) is obvious; condition (K3) holds with an exponen- 
tially decreasing [using Lemma 8.6 in Hardle et al. (1998)]. Condition 
(K2) is, for example, contained in the proof of Lemma 2 in Gine and Nickl 
(2010) which itself verifies (K5). Regarding (K4) we can argue directly as 
in (4.26) for r > 2, in which case <j) r is differentiable and has uniformly 
bounded derivatives. In case r = 2 a similar argument works, using that 02 
is uniformly bounded and Lipschitz on [k, k + 1] for each fcgZ. 

Consequently Proposition 5 applies if we can derive a limit of type (4.7) 
for the process, 

(4.28) yW(i)= I K{t,u)dW{u) = Y j <t> r (x + k)g k , 

Jr k 

with variance and covariance as in Section 4.2.3 and where the g^s are i.i.d. 
standard normal. 

Since for r = 2 the Battle-Lemarie scaling function 02 is not differentiable, 
we will have to treat the cases r = 2 and r > 2 separately. In both cases we 
rely on the following elementary (but somewhat cumbersome) key lemma 
on the maxima of the variance function (4.25). It will be proved in Section 
4.4.3. 

Lemma 1. Let <j) r be the scaling function for the Battle-Lemarie wavelet 
of order r, r = 2, 3, 4 and let 

al{t) = ^4> 2 r {t-k), teR. 

Then, a 3 (t) attains its absolute maximum only at the points t = I + 1/2, 
I 6 Z, and for r = 2,4, cr^(t) attains its absolute maximum only at the points 
t = lGZ. The values are, for every I £ N, 

-l:=-i(0 = £^)=B4 2) ) 2 , 

k k 
al := aj(l) = £ <t>l (*) = ^ + + 

k k 

and 

2 2n 1 /o\ M i 1 ( 3 ) i (3) \2 

a 3 :=a 3 (l- 1/2) = — + - ; + <_i) , 

k 

where 

m = E(4 3) - 4 3 2,) 2 - E<4 3 ' - 4 3 i)(4 3 i - 4%) 

fc fc 
and al 7 " , G Z, r = 2,3,4, are i/ie coefficients in (4.27). 
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4.3.2. The limit theorem for the piecewise linear Battle-Lemarie wavelet. 
We first treat the piecewise linear case r = 2 where the structure of the 
process allows for a direct reduction to the maximum of a stationary 
Gaussian sequence. Since YW(t) is piecewise linear, it can be written as 
[cf. (4.47) below] 

Y^ (t) = £ Mt + k)g k = + l)9k-i 

fcgz fcez 

(4.29) = (t - E a k ] 9k-i + (1 - (* - 0) E 

fc it 

i<kj + i,;gz, 

where the variables are i.i.d. standard normal. Let X{t) :=Y( 2 \t) / o~2 
be the same process normalized by the maximum of the variance function 
a 2 (t) which is attained at t = I £ Z and given in Lemma 1. We shall write 

(2) 

ak = a k and a = o~2 throughout this subsection, and we recall that the 
coefficients a\~ decrease exponentially. On any interval [I, I + 1], I £ Z, X has 
the form X(t) = (t — V)G\ + G2 and, obviously, the absolute maximum of 
I (t — l)G\ + G2I over t £ [1,1 + 1] is attained either at t = I or at t = I + 1. 
Therefore, 

sup |-XYi)| = max 
te[o,23n] o<i<2i™,iez 

Considering the process X indexed only by integers, we see that, for all 
Z, m E Z, 

a 2 E(X(l)X(l + m)) = E(^2 a>k-\9k-l E a k'-l9k>-l~m J 

^ k k' ' 

= E a k+l-l a k+l+m-l = E a k a k+m', 
k k 

that is, the sequence X{1), I G Z, is stationary with covariance 

r(m) = E a kak+m J E a k- 

k k 

Using the exponential decay of the a^'s in the bound 
E \ a k a k+m\ < 2^sup|a fc |^ E 

k k \k\>m/2 

one sees (log m)r(m) — > as m — > 00 (Berman's condition). Then we can ap- 
ply the usual result about weak convergence of the maximum of a station- 
ary Gaussian sequence satisfying Berman's condition [e.g., Theorem 4.3.3 
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in Leadbetter et al. (1983)], together with the asymptotic independence of 
maximum and minimum of such sequences [e.g., Davis (1979), page 459f]. 
The outcome is that the limit theorem for max <;<2jn zgz 1-^(01 ^ s the same 
as if the X(l) were independent; with A n ,B n as above (4.21), 



sup \Y^(t)\/a 

0<t<2J« 



B n )<x 



x £ . 



which, combined with Proposition 5 (which applies by observations in Sec- 
tion 4.3.1), gives the following proposition: 

Proposition 8. Let V, j n be as in Corollary 6. If fjp := f n {y,jn) is the 
wavelet density estimator from (2.1) with Battle-Lemarie scaling function 
4>2, if A n = A(j n ) and B n = B(j n ) are as in (4-21) and if a 2 = X^fc a L then, 
as n — > oo , 



sup 



Pif< Aj Vn2-i 



^(2) TP fV' 



(2) 



B, 



[0.1] 



< X 







for all x 6 



4.3.3. The limit theorem for smooth Battle-Lemarie wavelets. In the case 
of r > 2 we have to deal with maxima of nonstationary Gaussian processes. 
The following theorem is an adaptation to mean square differentiable cyclo- 
stationary processes [cf. after (4.25) above] of a theorem of Piterbarg and 
Seleznjev (1994) [see also Konstant and Piterbarg (1993), Hiisler (1999), 
Husler, Piterbarg and Seleznjev (2003)]. 

Theorem 2. Let X(t), t £ R, be a cyclo stationary, centered Gaussian 
process with period 1, variance o~x(t) and covariance rx(s,t). Assume: 

(1) X(t) is mean square differentiable and a.s. continuous; 

(2) rx(s,t) = ax(s)ax(t) onlyats = t; 

(3) sup f€ [ .i] = °x(*o) = 1 f or a unique to € (0, 1), o~ x {t) is twice 
continuously differentiable at to, o~' x (to) = 0, o~ x (to) < and E(X'(to)) 2 > 0; 

(4) (logu)sup S!t .| s _ t |> v |rx(s,t)| ->-0 as v^oc. 

Then, for all 

x 



T->oo 



lim Pr<^ sup \X(t)\ < h b T 



te[o,T] 



where 



ax = v 2 log T and 



bT = ot 



log log T + log 7r - log y/1 - E(X'(t ))ya' x (t c 
2a T 
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Proof. Under the hypotheses (l)-(3), the correlation Rx(s, t) = rx(s,t)/ 
(crx(s)ax(t)) admits the following development near (to, to): 

Rx(s, t) = 1 - ( ^ (X/ 2 (t ° ))2 + D(t, s)\ (t - s) 2 + o((t - s) 2 ), 

where D(t,s) is continuous at (to, to) and satisfies D(to,to) = [see, e.g., the 
proof of Corollary 2.1 in Konstant and Piterbarg (1993)]. Because of this all 
the hypotheses of Theorem 1 in Hiisler (1999), or of Theorem 1 in Piterbarg 
and Seleznjev (1994) are satisfied with 2a = -a"(t ) and 2b = E(X'(t )) 2 . 
Therefore, as T — > oo, 

Pr< sup |-X~(i)| < ut \ — > exp{— exp(— x)}, 
where u = ut is the solution to the equation 



TH. 



a/b 



-u 2 /2 



2lTU 



and where equals + (1/R) [cf. Konstant and Piterbarg (1993), page 
87]. Solving this equation as, for example, in the proof of Theorem 1.5.3 in 
Leadbetter et al. (1983), gives the result. □ 

We will apply this theorem to the process 

X r (t) = Y^(t)/a r 

for r = 3,4 — which has period one and where a r is given in Lemma 1 — in 
combination with Proposition 5 — to obtain the following result. 

Proposition 9. Let V,j n be as in Corollary 6. If fn '■= fn(y,jn) is 
the wavelet density estimator from (2.1) based on the quadratic (r = 3) or 
the cubic (r = A) Battle- Lemarie scaling function <j) r , then, for every 



sup 

fev 



¥i f ^A n (^/n2~^ 



In — tijn 



O-rVf 



BP < 



[0,1] 



-+0, 
r = 3,4, 



where a r is given in Lemma 1, A n = A(j n ), B^ = L>( r \j n ) with 
(4.30) 



A(l) = ^2 (log 2) i and 



B^ r \l) = A(l) 



log I + lQg(7T log 2) - log VI + D r 

A(l) 



3,4, 



£>3 = ^2 k (a^ — a^ 2 ) 2 /M with M as in Lemma 1, and D4 = 9^ fc (aJ,' 
4-2)VC with C a s in (4.53). 



,(3) ^2 



(D 
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Proof. We start with the case r = 3, the quadratic Battle-Lemarie 
wavelet. We first verify the hypotheses of Theorem 2 for the process X r , 
with ax(t) = a(t) and rx(s,t) = r(s,t). We drop the sub- or superindex 
r = 3 from X , Y , a. In this case Lemma 1 gives 

a 2 = CT 2 (1/2) = 1 j-(a k + a k ^f + ^. 
k 

As in (4.23), we can write 

Y(t) = J2Mt + k)g k , t€R, 
fcez 

where are i.i.d. standard normal and where 

<fe(* + k ) = \{ a k ~ 2a fc _i + a fc _ 2 )t 2 + (o fc _i - a fc _ 2 )t 

+ i(a fe _i + o fc _ 2 ), 0<t<l, 

by the computation in (4.49) below. It follows that 03 is differentiable for 
all t G R with the derivative 

(4.31) 3 (t + /c) = (a fc -2a fc _ 1 + a fe _2)t + (a fe _ 1 -a fc _ 2 ), < t < 1. 

In particular the process y'(i) = X^^iC* + is defined since the coeffi- 
cients ak have exponential decay, and 



Y(t + h)-Y{t)-hY'{t) 



/? 

<fe(* + k + /t) - 03 (* + k) - h(f>' 3 (t + k) 
h 



as /i — > (for < t,i + /i < 1, the quantity inside the parenthesis is dominated 
by h\a k — 2ak-i + ojfc— 2I which is square summable). This shows that the 
process y (and hence also X) is differentiable in quadratic mean. Moreover, 
y (and X) has a.s. continuous sample paths [note that Y(t) = t 2 G\ + tG2 + C 
for a constant C and a bivariate normal variable (Gi,G 2 )]. Hence, condition 
1 in Theorem 2 is satisfied. 

Condition 3 is also satisfied with to = 1/2: By Lemma 1, ax (1/2) = 
(t(1/2)/cj(1/2) = 1, and this maximum is strict. Moreover, using (4.51) below 
and the comments after it, we have 



a\t)-a\l/2) = ^ 



M( 1\ 2 M 
T '"2 + 1> 



M 
32 ~ 


M 

T 







t-iV-l 

2/ 4 



M 
32 
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which implies that a' (1/2) = and also gives 

(4.32) <4(l/2) = a" (1/2) /a = -M/(Aa 2 ). 
Finally, using (4.31), 

(4.33) E(X'(l/2)) 2 = J>' 3 (l/2 + k)) 2 /a 2 = J> fc - a fe „ 2 ) 2 /(4cx 2 ) > 

k k 

(if the last sum were zero, the exponential decay of the a k would be contra- 
dicted) which completes the verification of condition (3) in Theorem 2. 

We next consider condition 2. Recall (4.24) and (4.25); if this condition 
does not hold, then there exist s, t in [0, 1) and m £ N U {0} with s 7^ t or 
with m > 0, or both, such that 

+ k )Ms + m + k) = ± JYjfyt + k) jY/t^s + k)- 

k y k y k 

The right-hand side of this equation is different from zero for all s and t by 
(4.51) below. Consequently, this identity is satisfied if and only if there exist 
A 7^ such that the vector with A;th coordinate cp(t + k) equals A times the 
vector with fcth coordinate <j)(s + m + k). By (4.27) this is equivalent to the 
existence of A 7^ satisfying the infinite system of equations 

N 3 (t)a k + N 3 (t + l)o fc _i + N 3 (t + 2)a fc _2 

= XN 3 (s)a k+m + AiV 3 (s + l)a k+ m-i + XN 3 (s + 2)a fc+m _ 2 , 

k € Z, for some s, t and m satisfying the specified conditions. If we let 
v (l) g ^ b e defined by the coordinates v9 = a k _i, then we can write this 
system of equations as the vector equation 

N 3 (t)v<® + N 3 (t + lJuW + N 3 (t + 2) V ( 2 ) 

(4.34) 

= XN 3 (s)v { - m ^ + A7V 3 (s + l)v ( ~ m+1) + XN 3 (s + 2)w(- m+2 ). 

It is not difficult to see that if this equation has a solution with m = and 
s ^ t, s,t G [0, 1), or with m > and s,t € [0, 1), then a finite nonvoid set 
of vectors «W are linearly dependent. For instance, for m = 0, this equation 
becomes (using the explicit form of N 3 in Section 4.4.3) 

(t 2 - As> (0) + 2[(3(1 - A)/4) - (t - 1/2) 2 + X(s - l/2) 2 ]v m 

+ ((l-t) 2 -A(l-s) 2 )t;( 2 )=0, 

and the coefficients are all zero if and only if A = 1 and s = t so that if this 
equation has a solution with A 7^ and s 7^ t, both in [0, 1), then the vectors 
, i = 0, 1, 2, are linearly dependent. It is even easier to see that a similar 
conclusion holds for m > (in this case, since N 3 (t + 2) 7^ for t £ [0, 1), if 
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equation (4.34) has a solution for some m > 0, then the vectors «w ; v ( m +«) ; 
i = 0,1,2, are linearly dependent). However, suppose 

ELi A^ ft) = for 
some r > 0, some G Z and A, 7^ 0. Since 03(- — £) = Efc v lf^3(" ~~ 
Ei=i -^03 (" — ^) = follows which is impossible unless Aj = for all i be- 
cause different integer translates of a father wavelet are orthogonal, hence 
linearly independent. This verifies condition 2 in Theorem 2. 

Finally we check condition 4. Let us recall that, by the exponential decay 
of 03, there exist c\,C2 > such that |03(x)| < ciA -02 ^ for all x so that, for 
t - s > v > 0, 



a 



\r(s,t)\ 



Y,Ms-k + (t-s))(/)3(s-k) 



< 



k:\k- 



E 

-s\<(t- 

+ 



^( S -k+(t- S ))\ 



s)/2 



E 



fc:|fc— s|>(t— a)/2 



<2ci| 



^3||oo 



^ — C'2v/2 



proving condition 4. Hence, the process X(t) = Y(t)/a satisfies the hypothe- 
ses of Theorem 2 with T = l? n and constants given in (4.32) and (4.33). 
Combining the resulting asymptotic distribution for sup 0<t<2 jT l \X^)\ with 
Proposition 5 (which applies, as shown in Section 4.3.1), we obtain Propo- 
sition 9 for the quadratic Battle-Lemarie wavelet. 

Now we consider the cubic Battle-Lemarie wavelet density estimator (case 
r = 4). Theorem 2 cannot be applied directly to X±(€) = E& 04 ~~ k^g^ja^ 
because the variance of this cyclostationary Gaussian process of period 1 has 
its maxima at t = which contradicts condition 3 (to = 0,1 are not unique 
and are not interior to [0,1]). This is not a serious difficulty and can be 

easily dealt with. With Y n (t) := Y^\t) = E fe ^(2 jn t + k )dk, we will obtain 
a limit theorem for 



sup \Y n (t)\/a= sup 
te[o,i] te[o,2*«] 

via limit theorems for 

Z 



^<M^ + k)g k 



a:= sup \Y(t)\/a 



sup \Y n (t)\/a and = sup \Y n (t)\/a, 



S/2in <t<l-5/2in 



-S/2i™<t<l+6/23n 



where 



o" 2 = a\ = — y^(a k -i + 4a fc + a k +i) 
3d t-r 1 
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(cf. Lemma 1) and where < 5 < 1 (e.g., 5 = 1/2). Set Z(t) = Y(t + 6)/a, 
t > 0. Then, 

Z~= sup \Z(t)\ 

0<<<2J« -2<5 

and Z(t) is still a cyclostationary centered Gaussian process since rz(t,t + 
v) = rY(t + 5,t + 5 + v) which is periodical with period 1 for each v. But now 
o\(t) attains a unique and strict maximum on [0, 1] at the point to = 1 — S. 
It is easy to see, proceeding in analogy with the quadratic spline case and 
using computations from Section 4.4.3, that conditions 1-4 in Theorem 2 
are satisfied with 

a^(l-5) = -C/(S6a 2 ) and E(Z'(1 - 5)) 2 = J> fc - a k ^ 2 ) 2 /(4a 2 ), 

k 

where C > is defined in (4.53). Therefore, if we set 



A- = J2log(2^-25), 



log log(2J" - 26) + log 7T - log VT+Dl 



Theorem 2 proves that the random variables A n (Z n — B n ) converge in law 

to the Gumbel d 
(4.30). We have 



to the Gumbel distribution. Now, let A n and B n = B^f 1 be the constants in 



A n(Z~ - B n ) = A~(Z~ - B~)— + A n (B~ - B n ), 

and, as is easy to see, 
A~ 

— — 1 and A n (B~ - B n ) ->• asn->oo. 
A n 

We thus conclude that the sequence {A n (Z~ — B n )} is weak convergence 
equivalent to the sequence {A~(Z~ — B~)}, and therefore, for all x € M, 

Pr{A n (Z~ - B n ) < x} -)■ e~ e ~ x . 
The same argument gives (with Z(t) = Y(t — S)/a) 

PT{A n (Z+-B n )<x}^e- e ' x ; 
hence, since Z~ < sup tg r 01 i \Y n (t)\/a < Z+, we have proved that for all x, 

Pr|^ n ^ sup \Y n (t)\/a - B^j < xj -)• e" 6 "'. 

Combining this limit with Proposition 5 which, as argued in Section 4.3.1, 
applies to the projection kernels of the cubic spline wavelet, we obtain Propo- 
sition 9 for r = 4. □ 
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Remark 3. Regarding higher order Battle-Lemarie wavelets (r > 4), 
notice that — the scaling function (fr r being a piecewise polynomial function 
with smooth (and nonconstant) weldings — the absolute maximum of the 
variance function (4.25) on [0, 1] is attained at a finite number of points, 
perhaps even at a single point (as in the cases r < 4). In this case results from 
the literature mentioned before Theorem 2 can be applied, and proofs can 
be given along the lines of the proof of Proposition 9. After this paper was 
written it was shown that the variance function (4.25) has unique maxima 
in the case r < 9 [see Gine and Madych (2009), where one can also find a 
more general conjecture related to these questions]. 

4.4. Remaining proofs. 

4.4.1. Proof of Theorem 1. Throughout this proof we shall often write, 
in slight abuse of notation, f n (l) for f n (-,l). Note also that all densities in V 
are bounded by a fixed constant depending only on rj, b, and we shall denote 
this constant by U. For every / £ V there exists a unique t := t(f) such that 
/ satisfies Condition 3 for this t. Define 

B(j,t) = b 2 2-i\ <7(n,0 = V — 

(4.35) 

Ma(n 2 ,j) 



l/(2t+l) 



j* (t) = mini je J: B(j,t)< 
It is easy to see that jn(t) satisfies 

(4.36) 2«W ~ ( 

\log n 2 J 

so it is a "rate optimal" resolution level for estimating / £ C* (R) . We wish to 
show that j n defined in (3.1) is a 'good estimate' for j'^(t), which is achieved 
by the following lemma. Recall that j n is defined so far only up to the choice 
of the constant M' (cf. Remark 2), which can be retrieved from the proof of 
the following lemma (and Proposition 10). 

Lemma 2. (a) We can choose a finite positive constant M' depending 
only on K such that if j £ J n satisfies j > J^(t); then for every n £ N, 

Pr f Qn=j)<c2-i/ c , 

where the constant < c < oo depends only on M' and K. 

(b) There exists a positive integer m depending only on bi,b 2 ,r], and a 
constant c' depending only on M' from part (a) and on K such that for 
every j £ J n satisfying j < j^(t) — m and every n £ N large enough such 
that j* L (t) > 2, we have 

Pv f (j n =j)<c'2-^ c '. 
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Proof. Since this lemma only involves the sample 52, we set n = rii for 
notational simplicity. To prove the first claim, fix j > j^(t) and let j~ = j — 1. 
Then one has 



(4.37) Pr f (j n =j)< ]T Pr/fll Ur)-fn(l)\\oo>Ma(n,l)). 



by definition of j* (t) and since I > j > j* (t) . Consequently, the Zth proba- 
bility in the last sum is bounded by 



Pr/dl/nlT) - fnd) ~ Ef n (D + £/ n (Z)|U > (M — (M/2))a(n, I)) 
< Pr f (\\f n (j-) - Ef n {j-)\\oo > (M/4)a(n,Z)) 

+ Pr/(||/ n (Z) - £7/ n (i)||oo > (M/4)a(n,l)) < d2- l / d , 



where < d < oo depends only on M',K, and where we have used the fact 
that we can choose M' large enough but finite depending only on K so that 
Proposition 10 below applies. Feeding this bound into (4.37) and summing 
the series proves the first claim of the lemma. 

To prove the second claim, fix j < j^(t) — m where m is some positive 
integer, and observe that 




and that 



B(j-,t) + B(l,t) < 2B(j* n (t),t) < (M/2)a(n,j* n (t)) < (M/2)a(n,l) 



(4.38) Pr f (j n =j) < Pr/(||/„(j) - /„(£(t))||oo < Ma(n,f n (t))). 




~ \\fnti) ~ EfnU) ~ + Ef n U*(t))\\ 



oc 



so that the probability in (4.38) is bounded by 



Pr f (\\f n (j) - Ef n (j) - Uf n {t)) + Ef n {f n {t))\\oc 
>(h/b 2 )B(j,t)-B(j* n (t),t)-Ma(j*(t),n)). 



By definition of j*(t) and B(j,t), we have 



b ^B(j,t) - B(j*(t),t) = b M^-^B(j*(t),t) - B(j*(t),t) 
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and, using also y/(j*(t) - Tj/j*(f) > l/y/2 in view of j*(t) > 2, 

B(f n (t) -l,t)> (M/4)a(n,j* n (t) - 1) = (M/4)2~ V(n, £(t)) 
so that the last probability is bounded by 



M(h 2v(m -l)_ 2 ri]_ M 



> 



Pr/ ( ||/„(j) - £/„(j) - /„(£(*)) + Ef n (f n (t))\\ 

<Pr f [\\fn(j)-Ef n (j)\\ 00 >2- 1 
+ P*f(\\fn(m)-Efn(fn(t))\\ 



M(h 2v(m - 1) _ 2rl }_ M 



> 2" 



8 U 2 



0*("»Jn(*)) 



We can now choose m> 2 sufficiently large but finite and only depending on 
61,62,7? so that, using Proposition 10 below, the last two probabilities can 
be made less than c'2~^ c for some constant c' that depends only on M',K. 
□ 

For the rest of the proof we assume that M' and n have been chosen large 
enough so that Lemma 2 holds. As a first consequence we obtain 



(4.39) 



BupPr/{2** > 2>M} = supPr f {j n >j*(t)} 
fep fep 

< SUp ^ Pr /0n=i) 

^ eT, in( t )<J<imax 



< c Yl 2 ~ j/C ^ c"2~ m)/c " -> 

max 

as n — > 00 which already proves (3.8) by using (4.36) and the definition of 
a n . [Note that j n £ J n for all n implies that we have to prove (3.8) only for 
large n.] Moreover, 



supP r/ (2-^ > 2 m 2"^W) < supP r/ (j n <j*(t) 
fep fep 



m 



(4.40) 



sup 

fep _ 



E Pr /6n=j) 

imin<i<iS(*)-"l 

< c' 2 ~ j/C ' = c"'2-^ c "' -> 
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(4.41) 



supP r/ {j n £|£(*)-ro, £(<)]} -►() 



as n — ?• oo which we shall use in the following lemma. 
Lemma 3. We have (E denoting expectation w.r.t. S\) 



fmOn + Un)~ f 



cr,, 



[0,1] 



fm On + Un) ~ Ef ni (j n + u r , 



[0,1] 



c { K )\j ' f rutin +U n ) 

= op(l/y / logn) 
uniformly in V . 



Proof. We will use n — n\ ~ n2 without mentioning in this proof. A 
few preliminary observations are necessary: We first establish that 



(4.42) *n 1 H/*i& + «n) " /Hoc = P ( yiogn) 

uniformly in V\ note that by definition of j n (t) we have 

B(j + u n ,t)< (M/4)a(n 2 ,j + u n ) < C"a(m,j + u n ) 

for j > j n (t) — tti and for n large enough s.t. u n > m and where C" de- 
pends only on K, n\ju2 and the envelope U of V. Hence, using (4.41) and 
Proposition 10, 



< Pr/I^K, j + un)\\f ni (j + u n )-f\\ 00 >CC , } + o(l) 

j&[jn(t)-m,j*{t)} 

< Pr /{ ll/ni ti + U n ) ~ Ef ni (j + Un) | |oo 

ie[i*(t)-m,i*W] 

> CC'crimJ + U n ) - B(J + U n , t)} + o(l) 

< Pr / {||/ rei (i + u n )-£;/ rei (i + 

"n) I loo 

jebn(*)-"HJn(*)] 

>(CC"-C>(n 1 ,i + u n )} + o(l) 

= o(l) 

for C and n large enough depending only on V (so that this bound is uniform 
in V). 
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Second, (4.36), (4.39), (4.42) and Condition 4 imply that 

\\ f niQn + Un) ~ f \\oo = Op \ 



2in(t) log n 



n 



2<W2 =0p (i/iog 2 n ) 



uniformly in V, and since / > 5 on [0, 1], we also have 

(4.43) sup \f-?(y,3n + u n )\ = O p (1), 
ye[o,i] 

uniformly in V . Consequently, since the root-transformation is Lipschitz on 

— 1/2 ~ 

intervals bounded away from zero, we obtain, uniformly in V, (jn + 

u n) II [o,i] =0(1) and 

(4.44) sup | V /m (y,j n + u n )- y/f(yj\ = op (I /log 2 n). 
ye[o,i] 

We now prove the lemma. First, using the above facts, 



<7, 



f rutin + Un) ~ f 



c i K )V fmtin + Un) 



[0,1] 



fmtin + Un) - f 



[0,1] 



< ov 



(/ni tin + U n ) ~ f)(\f fni tin + U n ) - y/f) 



c(K)y/JJ f ni tin + U ri 



[0,1] 



o P (l/logn) 



Second, using Condition 3, (4.40) and Condition 4, we have for some constant 
d that depends only on b2,c(K),5 that 



cr, 



f rutin + U n )- f 



[0.1] 



fm tin +U n )- Ef ni (jn + U r , 



<K)yf] 



[0.1] 



Efmtin+Un) - f 



[0,1] 



= Op(Vn2^" W(t+(1/2)) 2^ u " (f+(1/2)) ) 



= op(l/y / logn) 

since / is bounded from below and since 2 _J ™^^* +1 / 2 ) ~ (ra/logn) -1 / 2 by 
(4.36). □ 

By the above lemma, and since A n = A(j n + u n ) ~ ^logn, to complete 
the proof of the theorem, it suffices to prove the required limit for 

fmtin + Un) ~ Ef ni (j 



Pi A A(j n + u) <7, 



[0,1] 



B(jn + U n ) < X 
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= Pr f (W n (j n +U n ,x)), 
where we introduce the events 



fni(j) ~ Ef ni (j) 



for j G N. Recall that Condition 2 implies, for every j n G J that 

P*f{W n (j n ,x)) -^((x) :=exp{-exp{-x}} 

as n — > oo, uniformly in V, and note that j + u n G J n for j G [j^(i) — m, j*(t)] 
and n large enough by (4.36). Using (4.41) and independence of j n and 
fnjti), we have 

Pr f (W n (j n + u n ,x)) 

= E Pr/(W n (j + u n , x) n {j n = j}) 

E Pr/ (W n (j + U n , x) n {j n =j})+o(l) 

j^{t)-m<j<j*(t) 

E PT f (W n ti + u n ,x))Pr f ({j n =j}) + o(l) 

j*(t)-m<j<j*{t) 

= CC*0 E Pr /6n=i) 

j*(t)-m<i<j*(t) 

+ E (Pr/(W n (i,x))-C(x))Pr / (j=i) + o(l). 

j*(t)-m<j<j*(t) 

But the limit of the last expression is C{ x )> completing the proof of Theorem 
1. The first quantity converges to £(#) as n — > oo since 

E Pr/(i„=j) = l- E Pr/(in=j')^l 

j*(t)-m<j<j*(t) rt\ft(t)-m,ft(t)] 

uniformly in / G by (4.41). The second quantity converges to zero since 
max |Pr / (W n (j») -C(x)| ^0, 

j*(i)-m<j<j*(t) 

in view of Condition 2 and since m is finite (depending only on V). 

4.4.2. Proofs and complementary results for Section 3.5. 

Proof of Proposition 4. We use the fact that the wavelet basis 
characterizes the space C*(1R) for t < r — 1, cf. Definition 1. Since 



\PM\ 



2 l/2 / %j){2 l x-k)g(x)dx 
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<2- l / 2 M 1 \\g\\ 00 

for every I, k and every bounded function g, we have for g = Kj(f) — /, 
whose wavelet coefficients are zero for I < j , that 

(4.45) ||^(/)-/||oo> Mli" 1 ™p \2 l ' 2 f3 lk {f)\. 

i>j,kez 

Take 

E m (k) = {fe C\R) : > 2 ^ t+1 / 2 h~ m for every I G N} 

and set 

A m {k) = {h G C\R) :\\h- /|| t>0O < 2~ m ~ 1 for some / G E m (k)} 

so that (recalling Definition 1), for every h G A m (k) and every I, \fii k (h)\ > 
2~?(t+i/2) 2~ _ Consequently, using (4.45), we have 

(4.46) ||i^(/i) - hU > W^H-™-^ 1 

for every nonnegative integer j and every h G ^4 m (fe). Define now 

m>0,fceZ 

all of whose elements satisfy the lower bound (4.46) for some m, and there- 
fore A C J\ff. A is clearly open and it is also dense in C*(M): Let g G C*(M) 
be arbitrary, and define the function g m by its wavelet coefficients a k (g m ) = 
a k (g) for all k and (3 lk (g m ) equal to p lk (g) when \/3 lk (g)\ > 2 -^+V2) 2 - m 
and equal to 2 - '( <+1 / 2 )2~ m , otherwise. Clearly g m G ^4 for every m, and for 
e > arbitrary we can choose m large enough such that 

||5-<?mlkcx,=SU P 2^+^ 
< 2~ m+1 < e. 

This proves that A/j is contained in the complement of an open and dense 
set, hence itself must be nowhere dense. □ 

We now construct some explicit densities in the class V, further illus- 
trating the genericity of Condition 3. For the case of compactly supported 
wavelets, let / be any density in the class T>(t,D,5,F). The wavelet series 
of / is 

oo 

/ = #0(/)+£X>fcWi*. 
1=0 k 

If tp is supported in [0, a] (if necessary after a translation), pick a ko G Z and 
lo large so that [ko/2 l , (ko + a)/2 ] G [0, 1] for every / > Iq, and define g(x) to 
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have exactly the same wavelet series as /, but, for I > Iq, fiik {g) = Afc (/) if 
> 2~ l( - t+ W and fo ko (g) = 2~ l ( t+1 / 2 \ otherwise. By choice of k this 
modification takes place only in [ko/2 l , (ko + a)/2 l ] G [0, 1] where / is larger 
than or equal to 5. Moreover, 



\\f-g\ 



j2(Mf) - 2- i ^)im k0 (f)\ < 2- /(t+1/2) }^ 



i>/o 



i>i 

so that by choosing Iq large enough this quantity can be made as small 
as desired; in particular g(x) > 5(1 — e) for x £ [iq,F2] and every e > 0. 
Furthermore, since / integrates to one, and since ip integrates to zero, g is a 
density, and its wavelet coefficients at kQ satisfy the lower bound \ fiik {g)\ > 
2-/(<+i/2)^i — e) for / > Iq. Using (4.45) this verifies the lower bound in 
Condition 3. 

If ip does not have compact support, as is the case for Battle-Lemarie 
wavelets, the above modification might lead to functions that are negative 
somewhere, and hence not densities. To remedy this, start with any function 
/ that is in and bounded from below by 5 for every x € M. Then the 

above modification at kQ, 1>Iq (and the proof) gives us a function / which 
is bounded from below by 5/2 on all of M. and satisfies |Afc (/)l ^ 2~'( t+1 / 2 ) 
for every I > Iq. Multiply / by a positive, bounded, integrable, infinitely- 
differentiable function h (possibly compactly-supported) that is equal to 
one on [—1,1], and then divide fh by giving a (possibly compactly 

supported) density on R which is again contained in C*(M) since this space 
is a multiplication algebra. The wavelet coefficients at kQ are 



l/WA)l 



f(x)h(x)i/ji ko (x) dx 



f(x)i/jik (x)dx + 



[-14] 



f(x)h(x)ipik {x)dx 



>|A fc o(/)|-2' /2 (||/||oo(l + ||^||oo)) 



(2 l x 



-i,i] c 



ko)\ dx, 



and the quantity we subtract is, by exponential decay of i/j, less than or 
equal to a constant times 2~'/ 2 A c ' 2 " feo ' for some < A < 1. For / > Iq large 
enough this quantity can be made smaller than any power of 2 l , hence the 
same lower bound for the wavelet coefficients at kQ holds for the density fh. 
Again, using (4.45) this verifies the lower bound in Condition 3. 
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(4.47) 



(4.48) 



4.4.3. Proof of Lemma 1. The piecewise linear Battle-Lemarie wavelet. 
In this case 

»r / n t t / \ [ x i for < x < 1, 

iV 2 (,)=/ [0il) */ [0il) (x) = | 2 _ X) forl < x < 2) 

(2) 

which yields (writing a k for a k ) 

2 (k + t) = a k N 2 (t) + a fc _iA^ 2 (t + 1) 

= a k t + a k - 1 (l -t), fc€Z,0<i<l. 
Then, 

and, using 2X>fe-i(ofe - afe-i) = - XX a fc - a^i) 2 , 

^ + 1) = ^(a fc - a fc _i) 2 t 2 + 2 ^ afc_i(afc - a fc _i)i + ^ 

= ^(a fc - a fc _!) 2 t(t - 1) + 

Now, XX a & ~~ °fc-i) 2 > 0; otherwise all the a k would be identical which is 
impossible because the a k decay exponentially. Therefore, 2 (fc + i) has a 
unique maximum on [0, 1), at t = 0, that is, the variance function ^ (fi 2 (t — k) 
has only isolated maxima which are at the points t = I G Z. 

TTie quadratic Battle-Lemarie wavelet. In this case, 

{x 2 /2, for < x < 1, 

3/4- (x-3/2) 2 , forl<x<2, 
(3-x) 2 /2, for2<x<3. 

We then have (still omitting the superindex 3 from a k ) 

4> 3 (k) = a k -iN 3 (l) + a k - 2 N 3 (2) = (a k ~i + a fc _ 2 )/2 and 

X^iO) = ^(<>i + afc-i) 2 , 
and, for < t < 1, 

3 (fc + 1) = a fc 7V 3 (t) + a k ^N 3 (t + 1) + a fc _ 2 iV3(i + 2) 

= a fc t 2 /2 + a fc _![3/4 - (t - 1/2) 2 ] + o fc _ 2 (l - t) 2 /2 



(4.49) 



t 2 + [o fc -i - afe_ 2 ]i 



+ T^" 1 + a k-2\- 
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Hence, 



2 2 

+ - afe-i) - (ofc-i - a fc _ 2 )](a fc _i - a k _ 2 )t 3 



+E 



But 



+ 2 K a fc ~~ a fe-l) ~~ ( a fc-i ~~ a fc _ 2 )](a fc _i + a fc _ 2 ) 
+ ^2( a k-i ~ a fc _ 2 )(o fc _i + a fc _ 2 )t + E 0l (fe). 

E( a fc-i - a fc _ 2 )(o fc _i + a fe _ 2 ) = 0, 



and 



E 2^ ~ a k~l)( a k-l + Ofc-2) 

= ^ ^2( a k ~ Ofe-l)(Ofc-l - Ofc-2) + - «fe-l)«fe-2 

= n E( a/c ~ Ofe-i)(ofc-i - Ofc-2) 

~ E( flfc - afc_i)(afe_i - afe_ 2 ) + E( afc - afc_i)a fc _i 
= E( Qfe ~ a fc-i)(°fc-i - afc_ 2 ) - ^^(ofc - ak-if- 

So, setting 

(4.50) M = E( a fc - a k _if - E( a fc - a fc _i)(a fc _i - a fe _ 2 ) 



we obtain 



(4.51) 



E + *) = ^ M ' 4 - Mf3 + ^ M * 2 + E 

= I M t 2 (t-l) 2 +^(/>2(/c), 0<i<l. 



We can write M as M = ||tt||||w|| — (u, u) where ||u|| = \\v\\, and so M = iff 
u = v, but this would mean 



o-i — oo = °o — a -i = °-i ~~ 0,-2 
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that is, that the points (k,a k ) lie on a straight line which contradicts the 
exponential decay of the a k s. So M is strictly positive. Therefore, Yl ~~ 
A;), t £ [0, 1], has a unique maximum at i = 1/2, E0i(fc + 1/2) — + 
Th at is, the variance function has only isolated maxima which are 
at the points k + 1/2, £ Z. 

XTie cu&ic Battle-Lemarie wavelet. We have [e.g., Schumaker (1993), page 
136] 



for < x < 1, 
for 1 < x < 2, 
for 2 < x < 4. 



r x 3 /6, 
N 4 (x) = < §- ±:r(x-2) 2 , 
liV 4 (4-x), 

Then <pi(k) = a k _iN 4 (l) + afc_2-/V 4 (2) + afc_ 3 iV 4 (3) which gives 

36 ^2 (j>l(k) = ^(a fc _i + 4a fc _ 2 + a fc _ 3 ) 2 . 
Also, for te [0,1], 

4 (A: + 1) = a k N 4 (t) + a^N^t + 1) + a k - 2 N A (t + 2) + a fc _ 3 iV 4 (i + 3), 
and since N 4 (t + 2) = iV 4 (2 - i) and iV 4 (i + 3) = iV 4 (l - t), we get 



i 6 

4>l(k + t) = a k — + a k _ l 



6 

dk-2 



1 



(t + l)(t-l) S 



2 1, . o 
(2 - i)£ 2 

3 2 V ; 



(1-tf 



= « K°fe ~ 3fl fc-i + 3a ^-2 - Ofc-3)* 3 + (3a fc _i - 6a fc _ 2 + 3a fc _ 3 )t 2 

D 

+ 3(afc_i - a fc _ 3 )t + (a k -i + 4a fc _ 2 + afc-3) 

Then, for < t < 1, 
36j^(fc + t) 

= ^[o-k ~ 3afe_i + 3a fc _ 2 - a k _ 3 ] 2 t e 

+ ^ 2(a fc - 3a fc _i + 3a fc _ 2 - a fc _ 3 )(3a fc _i - 6a fe _ 2 + 3a fe _ 3 )t 5 
+ y^[(3ojfc-i - 6a fe - 2 + 3a fc _ 3 ) 2 

+ 6(a fc - 3a fc _i + 3a fc _ 2 - a fc _ 3 )(a fc _i ~ «fc-3)]* 4 

+ ^3 2 ( afc ~~ 3a fc-i + 3a fc-2 - «fc-3)(afc_i + 4a fc _ 2 + a k - 3 ) 
+ 6(3a fe _i - 6a fc _ 2 + 3a fc _ 3 )(a fc _i - a fe _ 3 )]t 3 



(4.52) 
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+ ^[9(a fc _i -a k ^f 

+ 2(3a fc _i - 6a fc _ 2 + 3a fc _ 3 )(a fc _i + 4a fc _ 2 + afc_ 3 )]i 2 
+ ^6(a fc _i -a fc _ 3 )(a fc _i + 4a fc _ 2 + a fc _ 3 )t + 36 ^ ^(/c). 

Set 

-4 = ^[a fc - 3a fc _i + 3a fc _ 2 - a*;_ 3 ] 2 

= 5Z^ afc ~~ afc -^ ~~ 2 ( a fc-i ~ a k-2) + (afe-2 - «fc-3)] 2 
= 6^(a fc - a/c-i) 2 - 8^(a fc - a fc _i)(a fe _i - a fc _ 2 ) 

+ 2^(a fc - a fc _i)(a fc _ 2 - a fc _ 3 ). 
Of course, A > 0, and also, if we set 

■S = -2^(afc - a fc _i)(a fc _i - a fc _ 2 ) + 2^(a fc - a fc _i)(a fc _ 2 - a fc _ 3 ), 
then, 

A - B := C = 6^(a fc - a k ^if - 6^(a fc - a fe _i)(a fc _i - a fc _ 2 ) 
(4.53) 2 

Actually, C and ^4 are both strictly positive; if C = then the points (k,a k ) 
are on a straight line which is impossible as the \af~\ decrease exponentially 
with \k\. If A = 0, then the points (k,a k — a-k-i) are on a straight line, and 
this would also contradict exponential decay of the a k s [since, for some c 
and m we would have a k = c + mk + a k -\ = ■ ■ ■ = ck + k{k + l)m/2 + ao]- 

Cumbersome but easy manipulation in the above expression for ^ (f>\(k + 
t) gives the following: 

36^0l(A; + t) 

= ,4t 6 - 3Ai 5 + (2A + 5)t 4 + (A- 2B)t 3 + (-A + B)t 2 + ^ 0|(fc) 

(4.54) 

= t 2 (l - i) 2 [(i 2 - t - 1)^ + B] + 36 ^ 0l(/c) 

= i 2 (l - t) 2 [t(t - 1)A -C] + 36^2<t>l(k). 

Since A > and C > 0, we have t(t — 1)A — C < on [0, 1] and so it follows 
that cr|(t) = X/fe^K* + ^) attains its absolute maximum on [0,1], namely 
^2 4>\{k)-, only at the points t = and £ = 1. That is, the variance function in 
the cubic spline-wavelet case attains its absolute maxima, which are strict 
local maxima, exactly at the points fcsZ. 
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AAA. A useful inequality. The following exponential inequality, which 
is based on Talagrand's (1996) inequality and relevant empirical process 
techniques, was used repeatedly throughout the proofs. 

Proposition 10. Let K:M.— >M. be either a convolution kernel that is 
integrable and of bounded variation, or let K be a wavelet projection kernel 
and assume either that <f) has compact support and is of bounded variation, 
or that (p is a Battle-Lemarie father wavelet for some r > 1. Suppose P has a 
bounded density f and let f n (y,j) be the estimator from (2.1). Given C,T > 
0, there exist finite positive constants C\ = C\{C,K) and C% = C<i{C, T, K) 
such that, if 



^->C and CiJ(||/||ooVl)^<t<r, 
2> j V n 

then, for every n£N, 

Pv f \snp\f n (y,j)-Ef n (y,j)\ >t\ 

(4.55) 

< C2 exp 



nt 2 



Co(\\_ f IL, V l)2i 



Proof. Effectively this inequality was proved in Gine and Guillou [(2002), 
for convolution kernels], Gine and Nickl [(2009b), for compactly supported 
wavelets] and Gine and Nickl [(2010), for Battle-Lemarie- wavelets] . In the 
present form it can be deduced, for instance, from Proposition 1 in Gine 
and Nickl (2010), using the VC-bounds and variance computations in the 
aforementioned papers, with a 2 = 2 J c 2 (ET)(||/|| 00 V 1) and A equal to a large 
constant (depending only on C,K) times (||/||oo V I) 1 / 2 . □ 
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